Universal Behavior Analysis Agents, Reinforcement Learning for Antibiotic Discovery, MedBrowseComp 🚀

Health Intelligence (HINT)

May 26, 2025

2025-05-26

🚀

BehaveAgent: An Autonomous AI Agent for Universal Behavior Analysis
Researchers at Harvard have introduced BehaveAgent, a fully autonomous AI agent designed to perform behavior analysis from video without the need for manual labeling, retraining, or task-specific programming. By orchestrating large language models, vision-language models, and visual grounding systems, BehaveAgent can generalize across experimental paradigms and species—from mice and fruit flies to plants and humans.

This agent leverages zero-shot visual reasoning to identify behavioral paradigms, track relevant features, interpret behavior, and generate research reports that integrate peer-reviewed literature.

Achieved zero-shot behavior paradigm detection using AI-generated videos and self-generated analysis plans, adapting its behavior across rodent maze tasks, corvid problem-solving, and primate object manipulation.
Used prompt-to-pixel visual grounding (Molmo) and natural language reasoning to track task-relevant features such as a rodent’s nose or object centers, achieving tracking accuracy comparable to human annotators in open-field tests.
Enabled temporal behavior segmentation with explicit reasoning and VLM validation, grouping fine-grained behavior sequences into interpretable categories like “object investigation” and “general exploration.”
Generated structured scientific reports integrating analysis results with literature, using tools like SerpApi and Google Scholar to contextualize behavioral findings and suggest future directions.

BehaveAgent marks a shift toward scalable, explainable, and generalizable behavioral research, offering human-in-the-loop interactivity and eliminating barriers that have long limited cross-species behavioral analysis.

MedBrowseComp: A Benchmark for Deep Medical Search and Agentic Reasoning
A team of researchers from Harvard, MIT, and partner institutions has released MedBrowseComp, the first benchmark designed to rigorously evaluate AI agents on real-time, multi-hop medical fact-finding tasks. Unlike static medical QA datasets, MedBrowseComp reflects the complexity of actual clinical reasoning by requiring agents to retrieve and synthesize verified information from structured sources like HemOnc.org and ClinicalTrials.gov.

*Overall workflow of the curation of MedBrowseComp from arXiv*

Covering over 1,000 curated questions, the benchmark tests whether models can retrieve the right study, interpret regulatory documents, and even find financial market data relevant to drug approvals.

Compiled 1,089 challenging, fact-seeking medical questions—605 for deep research agents and 484 for GUI-based agents—built from curated clinical trials, FDA drug data, and oncology guidelines.
Benchmarked leading systems like GPT-4.1, Gemini 2.5 Pro, Claude Sonnet 3.7 CUA, and Perplexity, revealing steep performance drop-offs beyond 2-hop tasks and <10% accuracy on complex 4–5 hop chains.
Demonstrated that agents relying solely on internal knowledge or single-hop search performed poorly; deep research agents with iterative web browsing saw up to 75% performance boosts on difficult tasks.
Found GUI agents like Claude’s Computer Use to be less reliable due to tool-call latency and semantic confusion, but accuracy improved significantly when initialized from domain-specific sources like HemOnc.org.

MedBrowseComp sets a new bar for evaluating whether AI agents can conduct complex medical research autonomously, and highlights how far current systems are from being clinically trustworthy without human oversight.

YCDL: A Real-Time Multimodal Data Framework for Oncology Decision Support
A team at Yonsei Cancer Center has developed a clinical decision support system (CDSS) powered by the Yonsei Cancer Data Library (YCDL), an automated, multimodal data supply chain integrating clinical, genomic, and imaging data. Built from 171,128 cancer cases across 11 types, the system supports real-time data updates and decision-making with over 800 features per case.

*Overview of the YCDL framework from npj digital medicine*

YCDL enables comprehensive patient monitoring and rapid hypothesis generation, offering a scalable foundation for AI-driven oncology research and care.

Collected and continuously updated multimodal data—structured and unstructured—from EMRs using a customized ETL pipeline with natural language processing and daily quality control, achieving 92.6% and 98.7% median accuracy for surgical and molecular pathology extraction.
Generated tumor-stage-stratified survival analyses across 11 cancers and supported a rapid clinical hypothesis study on rectal cancer, with full data processing completed one month after request.
Developed an interactive CDSS dashboard combining 3D imaging, longitudinal tumor tracking, survival prediction placeholders, and timeline-based visualizations of patient treatment history.
Evaluated by 33 oncology professionals across five cancer types, the system received satisfaction scores exceeding 4.0/5 in ease of use, information reliability, and clinical usefulness, despite some interface limitations noted by long-term EMR users.

By integrating a dynamic data infrastructure with clinical visualization tools, YCDL demonstrates how hospitals can internalize and automate oncology data for precision care and scalable research.

PRS-Med: Position Reasoning Segmentation with Vision-Language Models in Medical Imaging
A new study introduces PRS-Med, a multimodal framework that integrates segmentation and spatial reasoning to interpret tumors in medical images using natural language prompts. Unlike existing models, PRS-Med explains not only what the tumor is but where it is, visually and textually.

*The architecture of PRS-Med from arXiv*

This is made possible by the MMRS dataset, designed specifically for training AI to reason about spatial anatomy in clinical settings.

Integrated a lightweight TinySAM image encoder with the medical vision-language model LLaVA-Med to jointly predict tumor masks and spatial descriptions from a single image-text prompt.
Generated a large-scale pseudo-labeled dataset (MMRS) with over 33,000 image-question-answer pairs across six imaging modalities, using segmentation masks to guide spatial question generation.
Outperformed prior segmentation models like SAM-Med2D, BiomedParse, and LISA on radiology and endoscopy datasets, achieving mDice scores up to 0.968 and reasoning accuracy up to 0.533.
Demonstrated that training segmentation and position reasoning jointly improves spatial understanding, offering a more interactive and diagnostically useful AI assistant for clinical workflows.

By combining image understanding and spatial dialogue in one model, PRS-Med opens a path toward more intuitive AI-driven diagnostics in radiology, ultrasound, and beyond.

RareFold: Structure Prediction and Peptide Design with Noncanonical Amino Acids
A new model called RareFold expands deep learning-based protein structure prediction to include 29 noncanonical amino acids (NCAAs) alongside the 20 standard ones.

RareFold network architecture and evaluation from bioRxiv

By treating each residue as a unique token, RareFold accurately predicts protein folds and enables inverse design workflows that incorporate chemically diverse amino acids. This capability culminated in EvoBindRare, a design framework that created functional linear and cyclic peptide binders with experimental validation.

Extended the EvoFormer architecture to handle 49 amino acid types, learning residue-specific structural patterns while using only 40 GB of memory per prediction, enabling efficient modeling of diverse sequences.
Achieved comparable accuracy to AlphaFold3 across global structure prediction metrics while outperforming it on local side chain accuracy for difficult residues like MSE and SAH, avoiding intra-residue atomic clashes seen in diffusion-based models.
Demonstrated robust confidence estimation via predicted lDDT scores, with a strong Spearman correlation (R = 0.87) to actual structure quality, allowing accurate identification of reliable predictions.
Designed and experimentally validated peptide binders against a ribonuclease target, with linear and cyclic peptides containing NCAAs showing micromolar binding affinities (Kd = 2.13 μM and 8.77 μM), comparable to a known wild-type binder.

RareFold enables protein modeling and design beyond the canonical amino acid space, opening up new possibilities for developing stable, specific, and immune-evasive therapeutics.

SyntheMol-RL: Reinforcement Learning for Antibiotic Discovery in Massive Chemical Spaces
To address the escalating threat of antibiotic-resistant pathogens like MRSA, researchers introduced SyntheMol-RL, a generative AI framework built to design novel, synthesizable antibiotics. Unlike prior models constrained by single-objective optimization or slow screening, SyntheMol-RL uses reinforcement learning to rapidly explore a 46-billion-compound space for molecules with both antibacterial activity and aqueous solubility.

*Overview of the SyntheMol-RL pipeline from bioRxiv*

The model successfully identified and validated a new antibiotic, synthecin, demonstrating efficacy in a mouse MRSA infection model.

Replaced Monte Carlo tree search with a reinforcement learning algorithm that generalizes across building blocks, accelerating compound generation while optimizing for multiple drug-like properties.
Outperformed both traditional virtual screening and its predecessor, SyntheMol-MCTS, by generating more potent, soluble, and structurally novel antibiotic candidates against S. aureus.
Synthesized and tested 79 compounds, with 13 showing potent in vitro activity and seven confirmed as structurally novel through detailed literature review.
Validated one lead compound, synthecin, which significantly reduced bacterial load and inflammation in a murine wound model infected with MRSA, supporting its therapeutic potential.

SyntheMol-RL offers a scalable and flexible pipeline for AI-driven drug discovery, demonstrating how reinforcement learning can bridge computational design and real-world antibiotic development.

Love Health Intelligence (HINT)? Share it with your friends using this link: Health Intelligence.

Want to contact Health Intelligence (HINT)? Contact us today @ lukeyunmedia@gmail.com!

Thanks for reading, by Luke Yun

Health Intelligence

Universal Behavior Analysis Agents, Reinforcement Learning for Antibiotic Discovery, MedBrowseComp 🚀

Health Intelligence (HINT)

Discussion about this post