2025 Workshop at BIRS
🎤 Sheng Yu: Taming EHRs for Statistical Readiness through Large Language Models and Knowledge Graphs
📅 Wednesday, August 20, 2025 • 🕘 08:49 - 09:15
🏛️ Tsinghua University
Summary: LLMs and knowledge graphs can transform unstructured EHR narratives into standardized, statistically ready data, overcoming the challenges of medical terminology variability in biomedical research data.
📖 Read more
Introduction: Biomedicine has long been one of the most important application areas of statistics. With the widespread adoption of electronic health records (EHRs) over the past decade, these records should, in theory, provide a vast amount of data for analysis. However, in practice, they remain underutilized, as effectively extracting information from EHRs is still a challenging and specialized natural language processing task, due to the substantial medical knowledge required and the variability of medical terminology. In this talk, we will briefly review fundamental concepts for analyzing EHRs, explain the challenges that make EHR analysis difficult, and introduce how we developed large language models and knowledge graphs to convert EHR narratives into structured and standardized data ready for analysis—opening up new frontiers for statistical research and accelerating progress in biomedicine.