Skip to main content

A LLM clinic for your training

Project description

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

Demonstration of the flow of AgentClinic

Release

  • [09/13/2024] 🍓 We release new results and support for o1!

  • [08/17/2024] 🎆 Major updates 🎇

    • 🏥 A new suite of cases (AgentClinic-MIMIC-IV), based on real clinical cases from MIMIC-IV (requires approval from https://physionet.org/content/mimiciv/2.2/)!
    • More AgentClinic-MedQA cases [107] → [215]
    • More AgentClinic-NEJM cases [15] → [120]
    • 💼 Tutorials on building your own AgentClinic cases!
    • Support for three new models--☀️ Anthropic's Claude 3.5 Sonnet, 📗 OpenAI's GPT 4o-mini, and 🦙 Llama 3 70B
  • [06/28/2024] 🩻 We added support for vision models and the NEJM case questions

  • [05/18/2024] 🤗 We added support for HuggingFace models!

  • [05/17/2024] We release new results and support for GPT-4o!

  • [05/13/2024] 🔥 We release AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environment. We propose a multimodal benchmark based on language agents which simulate the clinical environment. Checkout the paper and the website for this code.

Contents

Install

  1. This library has few dependencies, so you can simply install the requirements.txt!
pip install -r requirements.txt

Evaluation

All of the models from the paper are available (GPT-4/4o/3.5, Mixtral-8x7B, Llama-70B-chat). You can try them for any of the agents, make sure you have either an OpenAI or Replicate key ready for evaluation! HuggingFace wrappers are also implemented if you don't want to use API keys.

Just change modify the following parameters in the CLI

parser.add_argument('--openai_api_key', type=str, required=True, help='OpenAI API Key')
parser.add_argument('--replicate_api_key', type=str, required=False, help='Replicate API Key')
parser.add_argument('--inf_type', type=str, choices=['llm', 'human_doctor', 'human_patient'], default='llm')
parser.add_argument('--doctor_bias', type=str, help='Doctor bias type', default='None', choices=["recency", "frequency", "false_consensus", "confirmation", "status_quo", "gender", "race", "sexual_orientation", "cultural", "education", "religion", "socioeconomic"])
parser.add_argument('--patient_bias', type=str, help='Patient bias type', default='None', choices=["recency", "frequency", "false_consensus", "self_diagnosis", "gender", "race", "sexual_orientation", "cultural", "education", "religion", "socioeconomic"])
parser.add_argument('--doctor_llm', type=str, default='gpt4', choices=['gpt4', 'gpt3.5', 'llama-2-70b-chat', 'mixtral-8x7b', 'gpt4o'])
parser.add_argument('--patient_llm', type=str, default='gpt4', choices=['gpt4', 'gpt3.5', 'mixtral-8x7b', 'gpt4o'])
parser.add_argument('--measurement_llm', type=str, default='gpt4', choices=['gpt4'])
parser.add_argument('--moderator_llm', type=str, default='gpt4', choices=['gpt4'])
parser.add_argument('--num_scenarios', type=int, default=1, required=False, help='Number of scenarios to simulate')
parser.add_argument('--agent_dataset', type=str, default='MedQA')
parser.add_argument('--doctor_image_request', type=bool, default=False)
parser.add_argument('--total_inferences', type=int, default=20, required=False, help='Number of inferences between patient and doctor')

Code Examples

🎆 And then run it!

python3 agentclinic.py --openai_api_key "YOUR_OPENAIAPI_KEY" --inf_type "llm"

🤗 You can also try ANY custom HuggingFace language model very simply! All you have to do is pass "HF_{hf_path}" where hf_path is the HuggingFace path (e.g. mistralai/Mixtral-8x7B-v0.1). Here is how you can run Mixtral-8x7B locally with AgentClinic for both doctor and patient agents. 🤗

🔥 Here is an example with gpt-4o!

python3 agentclinic.py --openai_api_key "YOUR_OPENAIAPI_KEY" --doctor_llm gpt4o --patient_llm gpt4o --inf_type llm

⚖️ Here is an example with doctor and patient bias with gpt-3.5!

python3 agentclinic.py --openai_api_key "YOUR_OPENAIAPI_KEY" --doctor_llm gpt3.5 --patient_llm gpt4 --patient_bias self_diagnosis --doctor_bias recency --inf_type llm

🩻 Here is an example with gpt-4o on the NEJM reports

python3 agentclinic.py --openai_api_key "YOUR_OPENAIAPI_KEY" --doctor_llm gpt4o --patient_llm gpt4o --inf_type llm --agent_dataset NEJM --doctor_image_request True

⚠️ Can be quite slow ⚠️

python3 agentclinic.py --inf_type "llm" --inf_type "llm" --patient_llm "HF_mistralai/Mixtral-8x7B-v0.1"  --moderator_llm "HF_mistralai/Mixtral-8x7B-v0.1"  --doctor_llm "HF_mistralai/Mixtral-8x7B-v0.1"  --measurement_llm "HF_mistralai/Mixtral-8x7B-v0.1"
  • The extended MedQA and NEJM datasets are available through setting the keyword agent_dataset=NEJM_Ext and agent_dataset=MedQA_Ext

BIBTEX Citation

@misc{schmidgall2024agentclinic,
      title={AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments}, 
      author={Samuel Schmidgall and Rojin Ziaei and Carl Harris and Eduardo Reis and Jeffrey Jopling and Michael Moor},
      year={2024},
      eprint={2405.07960},
      archivePrefix={arXiv},
      primaryClass={cs.HC}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentclinic-0.1.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentclinic-0.1.0-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file agentclinic-0.1.0.tar.gz.

File metadata

  • Download URL: agentclinic-0.1.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.23

File hashes

Hashes for agentclinic-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1a9621323807517e0b216edb618fb141bcca55c5bd6e70470fc37fdf75ec23b0
MD5 318393159fb43e5c93a4a703640b661a
BLAKE2b-256 bb3fa9c4cbc395b699d67e5efb0ca60c242e002268fbf7a12d6f5f63c123bf42

See more details on using hashes here.

File details

Details for the file agentclinic-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agentclinic-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb0ec8596b1ef806674101f2c88ebd08731560de454124199d88535219f88403
MD5 58f80695ce59882456d49364a33462e4
BLAKE2b-256 37ffc61041be3d68e6479a7fa6743f24371ab19ee9f6796328d3e0d63333c551

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page