Leveraging Large Language Models and Patient Portal Messages for Early Identification of Depression

Jiyeong Kim
John Torous
Julia Adler-Milstein
Peter J. van Roessel
Fatima Rodriguez
Christopher Sharp
Michael A. Pfeffer
Jonathan H. Chen
Stephen P. Ma
Carolyn I. Rodriguez
Eleni Linos

0 evaluations Published on Aug 16, 2025

This article on Sciety

Abstract

Importance

Large language model (LLM)-assisted early warning system may help overcome existing barriers to timely depression diagnosis in patients with cardiovascular disease (CVD). This novel application of LLMs to screen patient messages could be applied to other chronic diseases, facilitating automated symptom-driven diagnoses and interventions.

Objective

To prospectively simulate the impact (change in time to diagnosis) of population mental health screening using LLMs by screening patient portal messages, and measure LLMs’ accuracy to identify individuals at high risk for depression diagnosis in patients with CVD.

Design

Prospective cohort study

Setting

Electronic health records from an academic hospital (Stanford Health Care)

Participants

Individuals with CVD diagnosed 2014-2024, subsequently diagnosed with depression

Intervention/Exposure

LLMs (Llama 3.1 8B, July 2024 version, Meta LLC, and MedGemma 4B, July 2025 version, Google DeepMind, LLC) to identify individuals with depression

Main outcome

Accuracy of LLMs in sensitivity for completeness to capture positive cases and positive predictive value (PPV) for correctness to capture positive cases, and changes in time to depression diagnosis

Results

We identified 115,156 patients with CVD, and 23.1% (N = 26,578/115,156) of those had co-morbid depression. We included individuals (n=2,314) who sent at least one message between CVD and depression diagnoses. Participants were mostly 65 years and older (n = 1,718/2,314, 74.2%), or of non-Hispanic ethnicity (N = 2,078/2,314, 89.9%), or of the White race (N = 1,506/2,314, 66.1%), but sex was balanced (females, N = 1,197/2,314, 51.7%).

PPV was 51.2% [95% CI: 47.5-54.5%] Llama 3.1 8B, and sensitivity was 83.6% [81.1-85.9] Llama 3.1 8B and 71.0% [67.0-75.3] MedGemma 4B. On average, the LLM (Llama 3.1 8B) detected depression 660 days earlier than the first charted diagnosis over a 1,746-day assessment period, a typical timeline from CVD to depression diagnoses in our cohort.

Conclusion/Relevance

LLMs identified individuals with depression significantly earlier than official diagnosis among patients with CVD, relying solely on longitudinal patient messages without additional medical information, with high sensitivity and Patient Health Questionnaire-9 comparable PPV. This novel approach is applicable to various diagnoses.

Key points

Questions

Among those with cardiovascular disease, can large language models (LLMs) identify individuals with depression earlier than the first charted diagnosis via patient messages, and how much sooner and accurately? How frequently are depression symptoms detected before diagnosis?

Findings

LLMs identified individuals with depression 660 days earlier than the official diagnosis, with promising sensitivity (83%) and Patient Health Questionnaire-9 comparable PPV (51%)

Meaning

LLMs applied to patient messages showed strong potential for early diagnosis as a screening assistant, which is a novel use for this data source that could be expanded to other diagnoses.

Related articles are currently not available for this article.