Mammography dataset creation

This project aims to build deep learning algorithms to improve breast cancer screening, including decreasing recall rates, reducing biopsies, or automatically detecting negative studies. To accomplish this, we must build a dataset of mammography images with ground truth labels extracted from radiology and pathology reports. The project requires NLP expertise to extract ground-truth labels from free-text radiology and pathology reports, which would then be used to annotate the mammographic images. Once this is achieved, we will require computer vision/DL expertise to predict outcomes based on the imaging. Students will have access to raw clinical and imaging data from Emory.


Automatic Labeling of Special Diagnostic Mammography Views from Images and DICOM Headers

Dmytro S. Lituiev, Hari Trivedi, Maryam Panahiazar, Beau Norgeot, Youngho Seo, Benjamin L. Franc, Roy Harnish, Michael Kawczynski, Dexter Hadley

Journal of Digital Imaging, vol. 32(2), 2019 Feb 31, pp. 228-233

Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning

Hari M. Trivedi, Maryam Panahiazar, April Liang, Dmytro Lituiev, Peter Chang, Jae Ho Sohn, Yunn-Yi Chen, Benjamin L. Franc, Bonnie Joe, Dexter Hadley

Journal of Digital Imaging, vol. 32(1), 2019 31, pp. 30-37