Analysis Unit Data Model for Statistical and Machine Learning Analysis using the Health Insurance Claims Database

Tomohide Iwao

doi:10.24105/ejbi.2022.18.8.80-87

Analysis Unit Data Model for Statistical and Machine Learning Analysis using the Health Insurance Claims Database

Author(s): Tomohide Iwao*

Background: Administrative databases of health insurance claims are becoming increasingly popular. However, since they generally contain only the data necessary to assess claims, they are insufficient for research purposes, and the data are normalized in such a manner that patient-care data are dispersed across multiple tables. Thus, creating a dataset that is appropriate for analysis requires a great deal of effort and involves techniques that would be difficult for clinicians. Objectives: The aim of the present study was to create a data warehouse (DW) that could provide easy access to the data required for epidemiological research Methods: First, epidemiological studies that used data from the National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB) as a source were surveyed to identify the attributes (variables) most commonly analyzed. Subsequently, these attributes were extracted from the NDB to construct a data model suitable for the analysis of single-patient units. Results: A DW featuring attributes frequently used in epidemiological research, which are also integrated at the per-patient level, was constructed. The DW was then used in two studies: one concerning postpartum hemorrhage and one concerning patients after cardiopulmonary resuscitation. Consequently, four of the six types (approximately 67%) and four of the seven types (approximately 57%) of the required attributes were available through the DW. Conclusion: This study constructed a DW by rendering the attributes that are frequently used in epidemiological analyses. This represents the first step in building a common infrastructure based on the NDB.

Full-Text | PDF

Abstract

Analysis Unit Data Model for Statistical and Machine Learning Analysis using the Health Insurance Claims Database