Prediction of clinical events using National Trauma Data Bank (NTDB)

The National Trauma Data Bank (NTDB) is a large repository that encompasses a wide variety of traumatic injuries, interventions, and outcomes that can be leveraged to build machine learning models. The NTDB is compiled annually by the American College of Surgeons (ACS) using standardized data contributions from trauma hospitals across the U.S. Over 200 studies have leveraged the NTDB and the majority have used multivariate-adjusted analysis with a focus on clinical outcomes, public health policy, injury prevention, quality, disparities, and scoring systems. The three most frequently controlled for covariates were age (95%), Injury Severity Score (85%), and gender (78%). However, nearly half of the studies did not control for these basic covariates in order to produce a risk-adjusted analysis of trauma mortality. Several studies have also applied machine learning techniques to the NTDB to investigate clinical problems such as traumatic brain injury and overall trauma severity.

A major challenge to leveraging the NTDB using machine learning is the large feature space and data sparseness. There are over 30,000 unique features in the dataset – for example, 14,000 unique International Classification of Disease, Ninth Revision (ICD-9) codes alone. Managing this heterogeneity of data means that special care must be taken during feature selection and reduction in order to provide clinically meaningful and generalizable predictive models.

We develop machine learning models on this dataset to predict clinical outcomes. These techniques are applicable to most types of predictive modeling that can be done using the NTDB, regardless of the clinical question.