Knowledge-Guided Large Language Models for Interdisciplinary Applications in Cultural Heritage Analysis and Conservation

Title

Organizers

• Jisheng Dang (dangjsh@mail2.sysu.edu.cn)
• Juncheng Li (junchengli@zju.edu.cn)
• Wenjie Wang (wenjiewang@ustc.edu.cn)

Abstract

The rapid advancement of large language models (LLMs) and multimodal foundation models is revolutionizing how we perceive, analyze, and preserve cultural heritage. These models offer the ability to seamlessly integrate and interpret diverse data modalities—such as text, images, video, speech, and even 3D scans—enabling a wide range of transformative applications in digital preservation, semantic restoration, interactive narration, and immersive engagement with cultural artifacts. Key issues to be addressed include:1)How to effectively align and fuse heterogeneous multimodal data (e.g., text, imagery, speech, 3D scans) for rich and coherent cultural representation. 2)The limitations of current models in semantic reasoning within complex cultural contexts, particularly in handling historical background, cultural symbolism, and ambiguous or polysemous expressions.3)The need for robust cross-modal alignment and contextual grounding, and the development of ethical and explainable frameworks for trustworthy human-AI collaboration in cultural domains.4)Exploring symbolic-neural hybrid architectures, and efficient training and adaptation strategies to ensure scalable and responsible deployment of multimodal LLMs in real-world cultural applications. We particularly welcome submissions exploring how LLMs and multimodal systems can advance the automatic understanding, storytelling, and knowledge dissemination of historical and cultural content.

Topics

• Multimodal fusion and representation learning for cultural artifacts
• Interpretability and explainability in LLM applied to heritage understanding
• Cross-modal reasoning and knowledge transfer in historical and artistic domains
• Retrieval-augmented generation (RAG) for cultural documents and archives
• Few-shot and zero-shot learning in multimodal cultural analysis
• Vision-language grounding for archaeological site or artwork interpretation
• Virtual reconstruction and restoration using multimodal AI
• Human-centric evaluation metrics for multimodal cultural AI systems
• Bias, fairness, and ethical considerations in AI-generated cultural content
• Cross-lingual and cross-cultural adaptation of foundation models

Submission Site

https://cmt3.research.microsoft.com/mmasia2025

Code of Conduct