EMBED
EMBED is a racially diverse mammography dataset containing 3.4M screening and diagnostic images from 110,000 patients collected from 2013-2020, with an equal representation of black and white women. The dataset is comprised of 2D, synthetic 2D (C-view), and 3D (digital breast tomosynthesis, i.e. DBT) images. It contains 60,000 annotated lesions linked to structured imaging descriptors and ground truth pathologic outcomes grouped into six severity classes. 20% of the total 2D and C-view dataset is released through the AWS Open Data Program and is available for free for research use (https://registry.opendata.aws/emory-breast-imaging-dataset-embed/).
Reference: Jiwoong J. Jeong* , Brianna L. Vey*, Ananth Bhimireddy, Thomas Kim, Thiago Santos, Ramon Correa, Raman Dutt, Marina Mosunjac, Gabriela Oprea-Ilies, Geoffrey Smith, Minjae Woo, Christopher R. McAdams, Mary S. Newell, Imon Banerjee, Judy Gichoya, Hari Trivedi. The EMory BrEast imaging Dataset (EMBED): A Racially Diverse, Granular Dataset of 3.4 Million Screening and Diagnostic Mammographic Images. Radiology: Artificial Intelligence. Jan 4, 2023.