Instability in Clinical Risk Stratification Models Using Deep Learning

Date:

I presented a poster at the Machine Learning for Health (ML4H) Symposium 2022 in New Orleans, based on research conducted at Google Health. The work investigates how randomness in training deep learning models, despite identical data, architecture, and hyperparameters, can lead to meaningfully different patient-level predictions in clinical risk stratification tasks.

The poster summarized our paper Instability in Clinical Risk Stratification Models Using Deep Learning, which introduced an analysis of model nondeterminism using electronic health records and several outpatient deterioration prediction tasks. We showed that while aggregate metrics remain stable, deep learning models exhibit substantial run-to-run variability in ranked patient lists, top-K selections, calibrated risk scores, and subgroup representation. We also proposed stability metrics and demonstrated how simple ensembling can mitigate these effects.

While in New Orleans, I also had the opportunity to attend NeurIPS 2022, which took place in the same week.