Comparative Assessment of Large Language Models for Microbial Phenotype Annotation

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Large language models (LLMs) are increasingly used to extract knowledge from text, yet their coverage and reliability in biology remain unclear. Microbial phenotypes are especially important to assess, as comprehensive data remain sparse except for well-studied organisms and they underpin our understanding of microbial characteristics, functional roles, and applications. Here, we systematically assessed the biological knowledge encoded in publicly available LLMs for structured phenotype annotation of microbial species. We evaluated the performance of over 50 LLMs, including state-of-the-art models such as Claude Sonnet 4 and the GPT-5 family of models. Across phenotypes, LLMs reached accurate assignments for many species, but performance varied widely by model and trait, and no single model dominated. Model self-reported confidence is informative, with higher confidence aligning with higher accuracy, and can be used to prioritize phenotype assignment, effectively distinguishing between high-and low-confidence inferences. Overall, our study outlines the utility and limitations of text-based LLMs for phenotype characterization in microbiology.

Related articles

Related articles are currently not available for this article.