Evaluating the Medical Knowledge of Open LLMs - Part 1
In this MedARC blog post, we compare generalist and medical domain-specific Large Language Models (LLMs) like GPT-4, Mistral, and Llama, and we evaluate their performance on MultiMedQA tasks for medical knowledge and reasoning.