Bowen Chen

I am a PhD student at Stanford DBDS advised by Professor James Zou. Currently, I am working on AI for healthcare.

Previously, I worked on AI for pathology advised by Professor Faisal Mahmood. For undergrad, I studied Computer Science and Statistics at Harvard (2018–2022).

selected publications (full list here)

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

Pan Lu*, Bowen Chen*, Sheng Liu*, and 3 more authors

arXiv preprint 2025

Abs Bib HTML

Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools’ generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysis and ablations, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving.
@article{octotools, title = {OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning}, author = {Lu*, Pan and Chen*, Bowen and Liu*, Sheng and Thapa, Rahul and Boen, Joseph and Zou, James}, journal = {arXiv preprint}, year = {2025}, }
A Multimodal Generative AI Copilot for Human Pathology

Ming Y Lu*, Bowen Chen*, Drew FK Williamson*, and 8 more authors

Nature Jun 2024

Abs Bib HTML

The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants and copilots tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We build PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and finetuning the whole system on over 456,000 diverse visual language instructions consisting of 999,202 question-answer turns. We compare PathChat against several multimodal vision language AI assistants and GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4. PathChat achieved state-of-theart performance on multiple-choice diagnostic questions from cases of diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision-language AI Copilot that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.
@article{lu2024multimodal, title = {A Multimodal Generative AI Copilot for Human Pathology}, author = {Lu*, Ming Y and Chen*, Bowen and Williamson*, Drew FK and Chen, Richard J and Zhao, Melissa and Chow, Aaron K and Ikemura, Kenji and Kim, Ahrong and Pouli, Dimitra and Patel, Ankush and others}, journal = {Nature}, pages = {1--3}, year = {2024}, month = jun, day = {12}, publisher = {Nature Publishing Group UK London}, }
Analysis of 3D pathology samples using weakly supervised AI

Andrew H Song, Mane Williams, Drew FK Williamson, and 8 more authors

Cell May 2024

Abs Bib HTML

Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.
@article{song2024analysis, title = {Analysis of 3D pathology samples using weakly supervised AI}, author = {Song, Andrew H and Williams, Mane and Williamson, Drew FK and Chow, Sarah SL and Jaume, Guillaume and Gao, Gan and Zhang, Andrew and Chen, Bowen and Baras, Alexander S and Serafin, Robert and others}, journal = {Cell}, volume = {187}, number = {10}, pages = {2502--2520}, year = {2024}, month = may, day = {19}, publisher = {Elsevier}, }
A visual-language foundation model for computational pathology

Ming Y. Lu*, Bowen Chen*, Drew F. K. Williamson*, and 10 more authors

Nature Medicine Mar 2024

Abs Bib HTML

The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain, and a model’s usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text and, notably, over 1.17 million image–caption pairs through task-agnostic pretraining. Evaluated on a suite of 14 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving histopathology images and/or text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, and text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.
@article{lu2024visual, author = {Lu*, Ming Y. and Chen*, Bowen and Williamson*, Drew F. K. and Chen, Richard J. and Liang, Ivy and Ding, Tong and Jaume, Guillaume and Odintsov, Igor and Le, Long Phi and Gerber, Georg and Parwani, Anil V. and Zhang, Andrew and Mahmood, Faisal}, title = {A visual-language foundation model for computational pathology}, journal = {Nature Medicine}, publisher = {Nature Publishing Group UK London}, year = {2024}, month = mar, day = {19}, issn = {1546-170X}, doi = {10.1038/s41591-024-02856-4}, url = {https://doi.org/10.1038/s41591-024-02856-4} }
Towards a general-purpose foundation model for computational pathology

Richard J. Chen*, Tong Ding*, Ming Y. Lu*, and 17 more authors

Nature Medicine Mar 2024

Abs Bib HTML

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks, requiring the objective characterization of histopathological entities from whole-slide images (WSIs). The high resolution of WSIs and the variability of morphological features present significant challenges, complicating the large-scale annotation of data for high-performance applications. To address this challenge, current efforts have proposed the use of pretrained image encoders through transfer learning from natural image datasets or self-supervised learning on publicly available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100\thinspacemillion images from over 100,000 diagnostic H&E-stained WSIs (>77\thinspaceTB of data) across 20 major tissue types. The model was evaluated on 34 representative CPath tasks of varying diagnostic difficulty. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient artificial intelligence models that can generalize and transfer to a wide range of diagnostically challenging tasks and clinical workflows in anatomic pathology.
@article{chen2024towards, author = {Chen*, Richard J. and Ding*, Tong and Lu*, Ming Y. and Williamson*, Drew F. K. and Jaume, Guillaume and Song, Andrew H. and Chen, Bowen and Zhang, Andrew and Shao, Daniel and Shaban, Muhammad and Williams, Mane and Oldenburg, Lukas and Weishaupt, Luca L. and Wang, Judy J. and Vaidya, Anurag and Le, Long Phi and Gerber, Georg and Sahai, Sharifa and Williams, Walt and Mahmood, Faisal}, title = {Towards a general-purpose foundation model for computational pathology}, journal = {Nature Medicine}, publisher = {Nature Publishing Group UK London}, year = {2024}, month = mar, day = {19}, issn = {1546-170X}, doi = {10.1038/s41591-024-02857-3}, url = {https://doi.org/10.1038/s41591-024-02857-3} }
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Ming Y Lu*, Bowen Chen*, Andrew Zhang, and 6 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Mar 2023

Abs Bib HTML

Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels in dimensions. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models to gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pretrain our text encoder. By effectively leveraging strong pretrained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. Our code is available at: https://github.com/mahmoodlab/MI-Zero.
@inproceedings{lu2023visual, title = {Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images}, author = {Lu*, Ming Y and Chen*, Bowen and Zhang, Andrew and Williamson, Drew FK and Chen, Richard J and Ding, Tong and Le, Long Phi and Chuang, Yung-Sung and Mahmood, Faisal}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages = {19764--19775}, year = {2023}, }
Artificial intelligence for multimodal data integration in oncology

Jana Lipkova, Richard J Chen, Bowen Chen, and 8 more authors

Cancer Cell Mar 2022

Abs Bib HTML

In oncology, the patient state is characterized by a whole spectrum of modalities, ranging from radiology, histology, and genomics to electronic health records. Current artificial intelligence (AI) models operate mainly in the realm of a single modality, neglecting the broader clinical context, which inevitably diminishes their potential. Integration of different data modalities provides opportunities to increase robustness and accuracy of diagnostic and prognostic models, bringing AI closer to clinical practice. AI models are also capable of discovering novel patterns within and across modalities suitable for explaining differences in patient outcomes or treatment resistance. The insights gleaned from such models can guide exploration studies and contribute to the discovery of novel biomarkers and therapeutic targets. To support these advances, here we present a synopsis of AI methods and strategies for multimodal data fusion and association discovery. We outline approaches for AI interpretability and directions for AI-driven exploration through multimodal data interconnections. We examine challenges in clinical adoption and discuss emerging solutions.
@article{lipkova2022artificial, title = {Artificial intelligence for multimodal data integration in oncology}, author = {Lipkova, Jana and Chen, Richard J and Chen, Bowen and Lu, Ming Y and Barbieri, Matteo and Shao, Daniel and Vaidya, Anurag J and Chen, Chengkuan and Zhuang, Luoting and Williamson, Drew FK and others}, journal = {Cancer Cell}, volume = {40}, number = {10}, pages = {1095--1110}, year = {2022}, publisher = {Elsevier}, }