GenAI Exceeds Clinical Experts in Predicting Acute Kidney Injury following Paediatric Cardiopulmonary Bypass

Alireza Mahani
Mansour Sharabiani
Alex Bottle
Yadav Sriniva
Richard Issitt
Serban Stoica

0 evaluations Published on Apr 29, 2025

This article on Sciety

Abstract

The emergence of large language models (LLMs) opens new horizons to leverage, often unused, information in clinical text. Our study aims to capitalise on this new potential. Specifically, we examine the utility of text embeddings generated by LLMs in predicting postoperative acute kidney injury (AKI) in paediatric cardiopulmonary bypass (CPB) patients using electronic health record (EHR) text, and propose methods for explaining their output. AKI could be a serious complication in paediatric CPB and its accurate prediction can significantly improve patient outcomes by enabling timely interventions. We evaluate various text embedding algorithms such as Doc2Vec, top-performing sentence transformers on Hugging Face, and commercial LLMs from Google and OpenAI. We benchmark the cross-validated performance of these 'AI models' against a 'baseline model' as well as an established clinically-defined 'expert model'. The baseline model includes structured features, i.e., patient gender, age, height, body mass index and length of operation. The majority of AI models surpass, not only the baseline model, but also the expert model. An ensemble of AI and clinical-expert models improves discriminative performance by 23% compared to the baseline model. Consistency of patient clusters formed from AI-generated embeddings with clinical-expert clusters - measured via the adjusted rand index and adjusted mutual information metrics - illustrates the medical validity of LLM embeddings. We create a reverse mapping from the numeric embedding space to the natural-language domain via the embedding-based clusters, generating medical labels for the clusters in the process. We also use text-generating LLMs to summarise the differences between AI and expert clusters. Such 'explainability' outputs can increase medical practitioners' trust in the AI applications, and help generate new hypotheses, e.g., by studying the association of cluster memberships and outcomes of interest.

Related articles are currently not available for this article.