Skip to main content

Dementia risk predictions from German claims data using methods of machine learning.

Alzheimer's & dementia : the journal of the Alzheimer's Association

Authors: Constantin Reinke, Gabriele Doblhammer, Matthias Schmid, Thomas Welchowski

INTRODUCTION: We examined whether German claims data are suitable for dementia risk prediction, how machine learning (ML) compares to classical regression, and what the important predictors for dementia risk are.

METHODS: We analyzed data from the largest German health insurance company, including 117,895 dementia-free people age 65+. Follow-up was 10 years. Predictors were: 23 age-related diseases, 212 medical prescriptions, 87 surgery codes, as well as age and sex. Statistical methods included logistic regression (LR), gradient boosting (GBM), and random forests (RFs).

RESULTS: Discriminatory power was moderate for LR (C-statistic = 0.714; 95% confidence interval [CI] = 0.708-0.720) and GBM (C-statistic = 0.707; 95% CI  = 0.700-0.713) and lower for RF (C-statistic = 0.636; 95% CI  = 0.628-0.643). GBM had the best model calibration. We identified antipsychotic medications and cerebrovascular disease but also a less-established specific antibacterial medical prescription as important predictors.

DISCUSSION: Our models from German claims data have acceptable accuracy and may provide cost-effective decision support for early dementia screening.

© 2022 The Authors. Alzheimer's & Dementia published by Wiley Periodicals LLC on behalf of Alzheimer's Association.

PMID: 35451562

Participating cluster members