Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format
Abstract
Importance: By law, patients have immediate access to discharge notes in their medical records. Technical language and abbreviations make notes difficult to read and understand for a typical patient. Large language models (LLMs [eg, GPT-4]) have the potential to transform these notes into patient-friendly language and format. Objective: To determine whether an LLM can transform discharge summaries into a format that is more readable and understandable. Design, Setting, and Participants: This cross-sectional study evaluated a sample of the discharge summaries of adult patients discharged from the General Internal Medicine service at NYU (New York University) Langone Health from June 1 to 30, 2023. Patients discharged as deceased were excluded. All discharge summaries were processed by the LLM between July 26 and August 5, 2023. Interventions: A secure Health Insurance Portability and Accountability Act–compliant platform, Microsoft Azure OpenAI, was used to transform these discharge summaries into a patient-friendly format between July 26 and August 5, 2023. Main Outcomes and Measures: Outcomes included readability as measured by Flesch-Kincaid Grade Level and understandability using Patient Education Materials Assessment Tool (PEMAT) scores. Readability and understandability of the original discharge summaries were compared with the transformed, patient-friendly discharge summaries created through the LLM. As balancing metrics, accuracy and completeness of the patient-friendly version were measured. Results: Discharge summaries of 50 patients (31 female [62.0%] and 19 male [38.0%]) were included. The median patient age was 65.5 (IQR, 59.0-77.5) years. Mean (SD) Flesch-Kincaid Grade Level was significantly lower in the patient-friendly discharge summaries (6.2 [0.5] vs 11.0 [1.5]; P < .001). PEMAT understandability scores were significantly higher for patient-friendly discharge summaries (81% vs 13%; P < .001). Accuracy was rated as top box (completely correct) in 54 of 100 reviews, and safety concerns were reported in 18 of 100 reviews. Most inaccuracies were attributed to omission of key information (52.1%) compared with hallucination (8.7%). Conclusions and Relevance: In this cross-sectional study, an LLM successfully transformed discharge summaries into a more readable and understandable format. However, accuracy concerns highlight the need for human oversight and validation of LLM-generated patient-facing medical content before implementation.
Clinical implications
Cross-sectional study at NYU Langone Health evaluating 50 discharge summaries from General Internal Medicine patients (June 2023). LLM transformed summaries to patient-friendly format using Microsoft Azure OpenAI between July 26-August 5, 2023. Dramatic improvements in readability and understandability: Flesch-Kincaid Grade Level decreased from 11.0 to 6.2 (mean, p<.001), indicating 6th-7th grade reading level versus 11th grade. Word count reduced from mean 1520 to 338 words (p<.001). Flesch-Kincaid Reading Ease score improved from 35.5 to 69.5 (p<.001). PEMAT understandability scores increased from 13% to 81% (p<.001). However, accuracy concerns identified: only 54/100 physician reviews rated summaries as completely correct (top box); 18/100 reviews noted safety concerns. Most inaccuracies due to omission of key information (52.1%) rather than hallucination (8.7%). Interrater reliability good at 97% agreement. No reviews rated summaries as completely incorrect; only 1/100 rated as more incorrect than correct. Study demonstrates LLMs can significantly improve discharge summary readability and understandability but highlights critical need for human oversight and validation before clinical implementation. Patient population: median age 65.5 years (IQR 59.0-77.5), 62% female, 38% male.