J Clin Immunol. 2026 Jun 5. doi: 10.1007/s10875-026-02035-9. Online ahead of print.
ABSTRACT
Early diagnosis of inborn errors of immunity (IEIs) can make a difference in patient outcomes and even cut healthcare costs. However, there are some challenges to overcome, such as clinical complexity, low awareness, and limited resources. Generative artificial intelligence has attracted considerable global attention in medical domains, particularly when integrated into clinical decision support systems (CDSS), as it has the potential to facilitate data interpretation, clinical reasoning, and the optimal use of knowledge resources. Preliminary studies have explored the potential of large language models (LLMs) in various information retrieval tasks, but a systematic evaluation of LLMs with and without retrieval mechanisms for IEI classification is still unexplored. We evaluated and compared the validity and reliability of the responses generated by four open-source and closed-source LLMs, in their baseline form and with augmented data, across 169 IEI patient records, using two input scenarios and four prompt templates. Our primary finding was that the models varied in terms of reliability and performance. The most reliable models were Gemini-1.5-Pro and Llama-3.1-8B-Instruct ([Formula: see text]) and the best-performing model without data augmentation was Gemini with an F1 score of [Formula: see text]. The results also showed that retrieval strategies improved the average classification performance, increasing the F1 score from [Formula: see text] to [Formula: see text] across all models. DeepSeek-R1, which reasoned over retrieved information through the integration of quality refinement and structured retrieval, achieved the best weighted F1 score of [Formula: see text]. The study highlights the effective use of generative AI and retrieval-augmented models as a decision support tool for IEI classification. However, incorporating retrieval systems into clinical decision-making processes requires adequate input, effective prompt engineering, and the adoption of retrieval strategies.
PMID:42246974 | DOI:10.1007/s10875-026-02035-9