Skip to main content

Transforming healthcare documentation: harnessing the potential of AI to generate discharge summaries

4/25/2024 • BJGP Open • License: CC BY 4.0
Reece Alexander James Clough (Institute of Naval Medicine, Gosport, UK) ; William Anthony Sparkes (Academic Department of Military General Practice, Research & Clinical Innovation, Defence Medical Services, ICT Centre, Birmingham, UK) ; Oliver Thomas Clough ; Joshua Thomas Sykes ; Alexander Thomas Steventon (Academic Department of Military General Practice, Research & Clinical Innovation, Defence Medical Services, ICT Centre, Birmingham, UK) ; Kate King (Academic Department of Military General Practice, Research & Clinical Innovation, Defence Medical Services, ICT Centre, Birmingham, UK)

Abstract

Background: Hospital discharge summaries play an essential role in informing GPs of recent admissions to ensure excellent continuity of care and prevent adverse events; however, they are notoriously poorly written, time-consuming, and can result in delayed discharge. Aim: To evaluate the potential of artificial intelligence (AI) to produce high-quality discharge summaries equivalent to the level of a doctor who has completed the UK Foundation Programme. Design & setting: Feasibility study using 25 mock patient vignettes. Method: Twenty-five mock patient vignettes were written by the authors. Five junior doctors wrote discharge summaries from the case vignettes (five each). The same case vignettes were input into ChatGPT. In total, 50 discharge summaries were generated; 25 by Al and 25 by junior doctors. Quality and suitability were determined through both independent GP evaluators and adherence to a minimum dataset. Results: Of the 25 AI-written discharge summaries 100% were deemed by GPs to be of an acceptable quality compared with 92% of the junior doctor summaries. They both showed a mean compliance of 97% with the minimum dataset. In addition, the ability of GPs to determine if the summary was written by ChatGPT was poor, with only a 60% accuracy of detection. Similarly, when run through an AI-detection tool all were recognised as being very unlikely to be written by AI. Conclusion: AI has proven to produce discharge summaries of equivalent quality to a junior doctor who has completed the UK Foundation Programme; however, larger studies with real-world patient data with NHS-approved AI tools will need to be conducted.

Clinical implications

Feasibility study conducted in UK with 25 mock patient vignettes comparing ChatGPT-generated discharge summaries to those written by 5 junior doctors who completed UK Foundation Programme. Each doctor wrote 5 summaries (25 total); ChatGPT generated 25 summaries using standardized prompt based on authors' experiences. Five general practitioners (GPs) evaluated summaries in blinded fashion, each reviewing 10 summaries (mix of AI and human-written). Key findings: 100% of AI-written summaries (25/25) deemed acceptable quality by GPs versus 92% of junior doctor summaries (23/25), though difference not statistically significant (p=0.15). Both groups showed 97% mean compliance with National Prescribing Centre (NPC) minimum dataset (19 criteria). All discharge summaries scored ≥15 with median score of 19 for both groups, no significant difference (z=-0.28, p=0.78). GP detection accuracy for AI-written summaries only 60% (sensitivity 0.52, specificity 0.68, PPV 0.62), indicating summaries were indistinguishable from human-written ones. OpenAI Text Classifier rated all AI-generated summaries as 'very unlikely' to be AI-generated, suggesting GPs performed better than AI detection tool. Study demonstrated ChatGPT can produce discharge summaries of equivalent quality to junior doctors, meeting GP satisfaction and minimum dataset standards. Cases covered range of medical specialties loosely based on scenarios encountered during Foundation Programme. Study addresses critical issue: discharge summaries are essential for continuity of care but are notoriously poorly written, time-consuming, often lack important information (test results, treatments, medications, follow-up plans), and can result in delayed discharge. Medication information errors pose particular patient safety risk. Study limitations: small sample size, single GP rating per summary, use of mock vignettes rather than real patient data. Future implementation would require AI to access and process actual medical records, extract relevant information, and maintain clinical reasoning transparency through AI-human hybrid process.

This product uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the product and does not endorse or recommend this or any other product. Links to third-party publications are provided for research discovery.
← Back to Research