publications
publications in reversed chronological order, * indicates equal contribution
2024
- A Multimodal Generative AI Copilot for Human PathologyMing Y Lu*, Bowen Chen*, Drew FK Williamson*, and 8 more authorsNature Jun 2024
The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants and copilots tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We build PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and finetuning the whole system on over 456,000 diverse visual language instructions consisting of 999,202 question-answer turns. We compare PathChat against several multimodal vision language AI assistants and GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4. PathChat achieved state-of-theart performance on multiple-choice diagnostic questions from cases of diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision-language AI Copilot that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.
- Analysis of 3D pathology samples using weakly supervised AIAndrew H Song, Mane Williams, Drew FK Williamson, and 8 more authorsCell May 2024
Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.
- A visual-language foundation model for computational pathologyMing Y. Lu*, Bowen Chen*, Drew F. K. Williamson*, and 10 more authorsNature Medicine Mar 2024
The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain, and a model’s usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text and, notably, over 1.17 million image–caption pairs through task-agnostic pretraining. Evaluated on a suite of 14 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving histopathology images and/or text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, and text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.
- Towards a general-purpose foundation model for computational pathologyRichard J. Chen*, Tong Ding*, Ming Y. Lu*, and 17 more authorsNature Medicine Mar 2024
Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks, requiring the objective characterization of histopathological entities from whole-slide images (WSIs). The high resolution of WSIs and the variability of morphological features present significant challenges, complicating the large-scale annotation of data for high-performance applications. To address this challenge, current efforts have proposed the use of pretrained image encoders through transfer learning from natural image datasets or self-supervised learning on publicly available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100\thinspacemillion images from over 100,000 diagnostic H&E-stained WSIs (>77\thinspaceTB of data) across 20 major tissue types. The model was evaluated on 34 representative CPath tasks of varying diagnostic difficulty. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient artificial intelligence models that can generalize and transfer to a wide range of diagnostically challenging tasks and clinical workflows in anatomic pathology.
- Medieval DNA from Soqotra points to Eurasian origins of an isolated population at the crossroads of Africa and ArabiaKendra Sirak, Julian Jansen Van Rensburg, Esther Brielle, and 8 more authorsNature Ecology & Evolution Mar 2024
Soqotra, an island situated at the mouth of the Gulf of Aden in the northwest Indian Ocean between Africa and Arabia, is home to 60,000 people subsisting through fishing and semi-nomadic pastoralism who speak a Modern South Arabian language. Most of what is known about Soqotri history derives from writings of foreign travellers who provided little detail about local people, and the geographic origins and genetic affinities of early Soqotri people has not yet been investigated directly. Here we report genome-wide data from 39 individuals who lived between 650 and 1750 CE at six locations across the island and document strong genetic connections between Soqotra and the similarly isolated Hadramawt region of coastal South Arabia that likely reflects a source for the peopling of Soqotra. Medieval Soqotri can be modelled as deriving 86% of their ancestry from a population such as that found in the Hadramawt today, with the remaining 14% best proxied by an Iranian-related source with up to 2% ancestry from the Indian sub-continent, possibly reflecting genetic exchanges that occurred along with archaeologically documented trade from these regions. In contrast to all other genotyped populations of the Arabian Peninsula, genome-level analysis of the medieval Soqotri is consistent with no sub-Saharan African admixture dating to the Holocene. The deep ancestry of people from medieval Soqotra and the Hadramawt is also unique in deriving less from early Holocene Levantine farmers and more from groups such as Late Pleistocene hunter–gatherers from the Levant (Natufians) than other mainland Arabians. This attests to migrations by early farmers having less impact in southernmost Arabia and Soqotra and provides compelling evidence that there has not been complete population replacement between the Pleistocene and Holocene throughout the Arabian Peninsula. Medieval Soqotra harboured a small population that showed qualitatively different marriage practices from modern Soqotri, with first-cousin unions occurring significantly less frequently than today.
2023
- Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology ImagesMing Y Lu*, Bowen Chen*, Andrew Zhang, and 6 more authorsIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Mar 2023
Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels in dimensions. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models to gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pretrain our text encoder. By effectively leveraging strong pretrained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. Our code is available at: https://github.com/mahmoodlab/MI-Zero.
2022
- Artificial intelligence for multimodal data integration in oncologyJana Lipkova, Richard J Chen, Bowen Chen, and 8 more authorsCancer Cell Mar 2022
In oncology, the patient state is characterized by a whole spectrum of modalities, ranging from radiology, histology, and genomics to electronic health records. Current artificial intelligence (AI) models operate mainly in the realm of a single modality, neglecting the broader clinical context, which inevitably diminishes their potential. Integration of different data modalities provides opportunities to increase robustness and accuracy of diagnostic and prognostic models, bringing AI closer to clinical practice. AI models are also capable of discovering novel patterns within and across modalities suitable for explaining differences in patient outcomes or treatment resistance. The insights gleaned from such models can guide exploration studies and contribute to the discovery of novel biomarkers and therapeutic targets. To support these advances, here we present a synopsis of AI methods and strategies for multimodal data fusion and association discovery. We outline approaches for AI interpretability and directions for AI-driven exploration through multimodal data interconnections. We examine challenges in clinical adoption and discuss emerging solutions.
2021
- Abstract PR-01: Real-time, point-of-care pathology diagnosis via embedded deep learningBowen Chen, Max Lu, Jana Lipkova, and 1 more authorClinical Cancer Research Mar 2021
There is an urgent need for widespread cancer diagnosis in low resource settings, especially in contrast to areas with developed healthcare systems. According to a study in The Lancet, in the U.S. there is one pathologist for every 20,000 individuals, while in Sub-Saharan Africa, there is only one for every million. In addition, current telepathology systems for cancer diagnosis mostly rely on pathologists performing remotely, which is low-throughput and requires more time and resources. With the growth of telepathology, remote diagnosis becomes a viable solution to address the lack of skilled pathologists in developing regions. Here, we present a cost-efficient device that incorporates embedded deep learning to achieve real time, point-of-care diagnosis of whole pathology slides. We achieve this with a low-cost, 3D-printable microscope that uses the Raspberry Pi and camera module to capture high-resolution images of slides. Then, using a weakly-supervised deep-learning model run on the NVIDIA Jetson Nano, the device is able to accurately classify the whole slide without any pixel-level annotations. Furthermore, the model’s attention-based approach to diagnosis allows us to generate human-interpretable heatmaps displaying the regions most influential to the model’s diagnosis. Our device also incorporates a touch screen and batteries to increase accessibility as an easy-to-use and low maintenance device while still maintaining an efficient runtime given the available resources. Overall, we demonstrate that the device is capable of achieving accurate, high-throughput, and interpretable cancer diagnoses in low resource settings.