Machine Learning Based Prediction of Non-communicable Diseases to Improving Intervention Program in Bangladesh

Author(s): Min Hu, Yasunobu Nohara, Yoshifumi Wakata, Ashir Ahmed, Naoki Nakashima and Masafumi Nakamura

Background: The prevalence of noncommunicable diseases (NCDs) is increasing throughout the world, including in developing countries. An NCD prevention program using information communication technology was implemented for 2 years in Bangladesh. Health checkup data were collected from 16,741 study subjects. However, the effectiveness of the utilized prevention strategy has not yet been evaluated, and some subjects with a risk of NCD have gone undetected.

Objective: This study aimed to improve intervention strategies by analyzing collected data and proposing a costeffective personalized predictive model to identify subjects predicted to be at future risk of NCD. Methods: We selected 2,110 subjects who participated in both years of the program and used a machine learning algorithm, gradient boosting decision tree, to build models that would identify subjects who were at risk of future high blood pressure, blood sugar or body mass index (BMI). We used area under the curve (AUC) of receiver operating characteristic curves and cumulative accuracy profile (CAP) curves to evaluate the performance of our models.

Results: Models showed fairly good performance: the BMI model (AUC=0.910) yielded the greatest AUC whereas the BS model (AUC=0.730) yielded the lowest. CAP curves indicated that the BMI model could correctly identify 98.0% of at-risk subjects at only 50% of the total time cost.

Conclusions: Our models represent powerful tools with which to improve the effect of health intervention programs and the effectiveness at which they are performed with limited medical resources.