Fine-tuning Stable Diffusion for Chest X-ray Generation
Project Lead: Dr Akshay Chaudhari
Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in generating high-quality images (see: DALL-E 2, CLIP, and of course, Stable Diffusion). However, medical imaging data fundamentally differs from natural images. The language used to capture relevant details in medical data succinctly uses a different, narrow but semantically rich, domain-specific vocabulary. Not surprisingly, multimodal models trained on natural image-text pairs do not tend to generalize well to the medical domain. Developing generative imaging models that faithfully represent medical concepts while providing compositional diversity could mitigate the lack of high-quality, annotated medical imaging datasets.
For this reason, we are adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays, and their corresponding radiology (text) reports. We investigate such models’ ability to generate high-fidelity, diverse synthetic x-rays conditioned on text prompts. Our preliminary results are shared in our first manuscript here, and we are currently extending this work.