CheXchoNet: A Chest Radiograph Dataset with Gold Standard Echocardiography Labels

What is CheXchoNet?

CheXchoNet is a large dataset of chest X-rays paired with gold standard disease annotations derived from echocardiograms on the same patients. The dataset contains 71,589 total chest X-rays conducted on 24,689 unique patients.


Why is CheXchoNet different from other datasets?

Many large datasets containing chest X-rays (e.g., CheXpert, ChestX-ray14) have been made publicly available and have been instrumental in the development of improved machine learning methods to diagnose pathologies identified by radiologists in regular clinical practice. These datasets take the approach of extracting diagnostic assessments from radiology reports and constructing supervised models to match human expert performance on various diagnostic tasks.
With CheXchoNet, we propose an alternate paradigm which goes beyond replicating human-level performance: pair an existing diagnostic test with labels from a more accurate higher, fidelity diagnostic test. We pair chest X-rays with gold standard annotations of structural heart disease derived from echocardiograms conducted on the same patients. In doing so, we define a new task: the diagnosis of cardiac structural abnormalities related to heart failure using only a chest X-ray. If accurate models are built for this task, there is potential for chest X-rays to be used as part of a screening tool for structural heart disease, leading to earlier detection and improved outcomes .

How can I access the data?

The dataset is being hosted on PhysioNet. Please see the link for more detailed information about the data and how to access it.

Is there an associated publication?

We conducted a study using this data to detect cardiac structural abnormalities using deep learning models. We recently published this work in the European Heart Journal. Please see the study for additional details.

Is code available?

A public repository is available, which contains a jupyter notebook to explore the data and scripts for training/evaluating models.

Please reach out or follow us for more updates!

Email: sab2323@cumc.columbia.edu
Twitter: @sabhave, @PierreEliasMD