Generative artificial intelligence tools

Emerging generative artificial intelligence (AI) models exhibit remarkable abilities to process and produce text and other media at scale, and also have excellent performance in a wide range of clinical tasks. Harnessing their abilities could improve the accuracy, efficiency, and accessibility of healthcare for patients around the world. For a primer on the technology behind state-of-the-art generative AI and potential applications in clinical settings, see these reviews on large language models (LLMs) or generative AI more generally, both in Nature Medicine. We also provide an overview of the systems, applications, and ethical concerns surrounding vision-language models in PLOS Digital Health.

Automating evidence synthesis

Vision-language model for CT scan reporting

RETFound Global

A general purpose vision-language model with explainability features

Automating evidence synthesis

Systematic review is the foundation of evidence-based medicine, as well as the gold standard for evidence synthesis across academic disciplines. Many of the processes involved in systematic review are text-based, rules-based, and repetitive. This makes these tasks amenable to automation with large language models (LLMs).

Abstract screening is often one of the most time-consuming exercises involved in systematic review; large numbers of study records must be checked against explicit inclusion/exclusion criteria to whittle down the number of studies that require longer review. Led from Oxford, a multicentre team completed an extensive validation exercise for LLMs designed to automate abstract screening across a representative sample of gold-standard systematic reviews from The Cochrane Library. Our findings (Journal of the American Medical Informatics Association) indicate that LLMs have excellent potential to improve the efficiency of systematic review, and we demonstrate how best to combine LLM and human researcher decisions 'in series' and 'in parallel'. Our code is freely available so others can make use of our approach!

Sanghera, R. et al. High-performance automated abstract screening with large language model ensembles. Journal of the American Medical Informatics Association ocaf050 (2025).

Vision-language model for CT scan reporting

A growing burden of increasing scan requests is placing increasing pressure on radiologists who are in scarce supply around the world, including within higher income countries. With researchers from University of Birmingham, Queen Mary University of London, Guangdong Institute of Technology, and Meta, we have developed μ²LLM: a small multimodal large language model that integrates guided questions to preserve critical details when interpreting 3-dimensional computed tomography (CT) scans. Model training is enhanced with direct preference optimisation (DPO), as used to develop early reasoning models such as DeepSeek-R1.

μ²LLM consistently outforms larger baseline models across multiple CT datasets, and its small size makes it feasible to deploy locally without having to share sensitive imaging data with external server-providers or model hosts. The model is multilingual, with consistent performance when reporting scans in English and in Mandarin Chinese.

We presented μ²LLM and our validation work at MICCAI2025.

RETFound Global

AI models frequently exhibit performance drops when deployed with new populations, as certain demographics or devices may be underepresented or not represented within training or fine-tuning data. RETFound Global is an ambitious project seeking to train a foundation model for retinal imaging by drawing on diverse data sources from collaborators around the world, purposefully as broad as possible in its inclusion criteria. An introduction to the project has been published in Nature Medicine.

A general purpose vision-language model with explainability features

The Explainable Vision-Language Foundation Model for Medicine (EVLF-FM) is a modality-agnostic system that can quickly be fine-tuned to develop state-of-the-art performance in classification (e.g. diagnosis), visual grounding (e.g. highlighting pathological features), and visual question-answering (e.g. providing advice based on clinical imaging). Across computed tomography (CT), dermoscopy, fundus photography, histology, optical coherence tomography (OCT), ultrasound, and X-ray; EVLF-FM surpasses or matches purpose-built state-of-the-art models in a wide variety of clinical tasks.

Led from the Agency for Science, Technology and Research (A*STAR) and Duke-NUS Medical School, our full report detailing the development and validation of EVLF-FM is available on arXiv.

Page updated

Google Sites

Report abuse