journal of biomedical informatics
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.
Kazuya Okamoto*1, Takashi Yamamoto2, Luciano H.O. Santos1, Shusuke Hiragi1, Osamu Sugiyama3, Goshiro Yamamoto1, Masahiro Hirose4 and Tomohiro Kuroda1
 
1 Division of Medical Information Technology and Administration Planning, Kyoto University Hospital, Japan, Email: kuhp.kyoto-u@ac.jp
2 Patient Safety Unit, Kyoto University Hospital, Japan, Email: ti-yo@kyo.ac.jp
3 Preemptive Medicine & Lifestyle-Related Disease Research Center, Kyoto University Hospital, Japan, Email: kuhp.kyoto-u@ac.jp
4 Faculty of Medicine, Shimane University, Japan, Email: mhorse.kyoto-u@ac.jp
 
*Correspondence: Prof. Kazuya Okamoto, PhD, Kyoto University Hospital, 54 Kawahara-Cho, Shogoin Sakyoku, Kyoto, 606-8507, Japan, Email: kazuya@kuhp.kyoto-u.ac.j

Citation: Okamoto K, et. al. (2020). Detecting Severe Incidents from Electronic Medical Records Using Machine Learning Methods. EJBI. 16(1): 26-28

This open-access article is distributed under the terms of the Creative Commons Attribution Non-Commercial License (CC BY-NC) (http://creativecommons.org/licenses/by-nc/4.0/), which permits reuse, distribution and reproduction of the article, provided that the original work is properly cited and the reuse is restricted to noncommercial purposes. For commercial reuse, contact submissions@ejbi.org

Abstract

The goal of this research was to design a solution to detect non-reported incidents, especially severe incidents. To achieve this goal, we proposed a method to process electronic medical records and automatically extract clinical notes describing severe incidents. To evaluate the proposed method, we implemented a system and used the system. The system successfully detected a non-reported incident to the safety management department.

Keywords

Safety management; Supervised machine learning; Medical records

Introduction

In order to prevent medical accidents at hospitals, it is important to grasp those events at early stages that lead to severe medical accidents and then take appropriate actions based on the events. These events, which are defined as incidents, are usually reported by medical staff to the safety management department that is responsible for the prevention of medical accidents in hospitals. The reports are analyzed by the department in order to decide which measures should be implemented by the clinical staffs. This mechanism is called an incident reporting system [1].

One of the main issues related to incidents is a long delay or even lack of reporting [2,3]. It is especially important to detect those cases since they affect the safety management department’s ability to respond properly.

Previous studies have addressed this problem partially for specific types of incidents such as neonatal intensive care [4], or using keyword search [5], but there were not so many works focusing natural language processing for general incident detection.

The goal of this research is to design a solution to detect these non-reported incidents, especially severe incidents. To achieve this goal, we develop methods to process electronic medical records and automatically extract clinical notes describing severe incidents. The extracted notes are treated as incident candidates which are shown to the safety management department for further analysis.

Methods

We develop methods to process electronic medical records and automatically extract clinical notes describing incidents of injection by using the Support Vector Machines (SVM) based technique [6].

First, we manually label a training set of clinical notes into two categories based on whether they include a severe incident report or not. Next, by morphological analysis, the training set is separated into words and arranged in a vector space using single words as the axes. Then, the SVM creates the machine learning models from the arranged training set with labels.

Finally, based on the created machine learning models, the SVM extracts positive clinical notes which are estimated as clinical notes with incident reports. The extracted notes are treated as incident candidates which are shown to the safety management department for further analysis.

Clinical Notes

Clinical notes are written by physicians, nurses and co-medicals to record patients’ situations and conditions, assessments given by medical staff, and treatments performed by medical staff. In hospitals, enormous clinical notes are stored continually [7] and the importance of analyzing clinical notes is increasing. Since the clinical notes are written in a free-text representation, the natural language processing is necessary to analyze them. In this study, we classify clinical notes into two classes and attach labels to the clinical notes based on whether each clinical note includes description of a severe incident or not. This labeling is performed by experts of the patient safety unit in Kyoto University Hospital.

Making Document Vectors

In order to convert clinical notes into forms which machine learning methods can deal with, we make document vectors from the clinical notes. First, we separate the clinical notes of free-text into a set of words. Especially, in the case of Japanese documents, there is no clear separations of words, i.e. space, in the sentences. Therefore, to find words from the sentences, we need to use morphological analysis. Additionally, Japanese morphological analysis can identify the Japanese categories of words, such as ’noun’ and ’verb’. In this method, we use nouns, adjectives, verbs, adverbs and auxiliary verbs.

After separating the sentences of the clinical notes into the words, we make document vectors from the sets of the words. In this document vector, each kind of words corresponds to each axis of the vector and the number of each kind of words included in each document corresponds to the value of each axis.

The created document vectors with the labels related to incidents will be used as input data of a machine learning method.

Normalization of Document Vectors

Since clinical notes are written in a free-text representation, clinical notes are of various lengths. Fujita et al. pointed out that some clinical notes of a particular patient in a year exceeded eight million characters [7]. In order to refrain an influence of the length of clinical notes on the results of machine learning methods, normalization is effective. In this study, we normalize document vectors by dividing by the square root of lengths of the document vectors [8].

Machine Learning Method

There are many machine learning methods such as the Naive Bayse, a decision tree method and a random forest method, a neural network and the Deep Learning, and SVM. In this study, we use the linear kernel SVM because the linear kernel SVM has a tendency to avoid overfitting even if document vectors have enormous axes [9]. This characteristic is very important because the number of axes of document vectors is several thousand usually.

The linear kernel SVM trains by using the normalized document vectors with the la- bels related to incidents and makes a model to classify an inputted normalized document vector into two classes of whether the original clinical note includes the description of an incident.

Experiments

Using the proposed method based on the linear kernel SVM, we implemented an incident candidate reporting system. To evaluate the system, we asked a staffs of the safety management department to judge whether extracted incident candidates were incidents or not.

System Implementation

In order to make a training set, we utilized the severe incidents from the 1st of April in 2017 to the 31st of March in 2018 in Kyoto University Hospital. The number of the severe incidents was 127 during the period. We extracted the clinical notes of the patients the incidents happened to, and on the day when the incidents happened. The number of extracted clinical notes were 2,842. A staff of the safety management labeled the extracted clinical notes and made a training set. The numbers of positively labeled clinical notes and negatively labeled clinical notes were 212 and 2,630, respectively.

The clinical notes of the training set was separated into sets of words by using MeCab [10], Japanese morphological analysis. The sets of words were converted into document vectors and normalized them. The normalized document vectors were used to make a machine learning models by LIBSVM [11], one of the reliable SVM package.

Preliminary System Evaluation

In order to confirm the performance of the implemented system, we evaluated the system preliminarily. We evaluated the system by using the document vectors of the previous subsection in 10- fold cross validation.

As a result, the numbers of true positive, false positive, false negative and true negative were 69, 29, 143 and 2,601, respectively. Consequently, the accuracy, the precision and the recall of the implemented system were 93.9%, 70.4% and 32.5%.

System Evaluation

In order to find incident candidates, we used inpatients’ clinical notes written from the 24th of October in 2018 to the 18th of November in 2018. The number of the clinical notes was 294,731.

Finally, we asked a staff of medical safety department to check whether the found incident candidates include severe incident reports or not. Figure 1 shows the interface of the system which we asked the staff to use for checking.

ejbi-implemented-system

Figure 1: The interface of the implemented system.

Results

The clinical notes used as the training data have 6,148 kinds of words, which means the number of the dimensions of the document vector space was 6,148. This number was enough large for the linear kernel SVM to work effectively.

The system extracted 121 incident candidates from the 294,731 clinical notes. 34 of them were judged to include severe incident reports by the staff of the medical safety department, which means that the precision is 28.1%. 31 of them were related to sudden critical changes of patients’ conditions; other two were related to incidents during surgeries; the other was related to a fall from a bed. Moreover, several sudden critical changes among the detected incidents were not reported actually.

DISCUSSION

The precision of the results is lower than the preliminary evaluation. However, considering that the rate of negative instances was much higher than the one in the preliminary evaluation, the linear kernel SVM could avoid overfitting and refrain the number of false negative effectively.

Unlike similar studies our proposed method used natural language processing to detect unreported incidents of any kind without being limited to specific topics.

This study was limited to a specific timeframe for a single hospital. In order to evaluate the generalizability of this result, we need to adopt our proposed method to multiple hospitals in a longer timeframe.

CONCLUSION

In this research, we aimed to establish a method to extract incident candidates from clinical notes in order to detect nonreported severe incidents. In addition, we implemented a reporting system that presents incident candidates extracted by using the proposed method. The system successfully detected a non-reported incident to the safety management department, thus our goal was achieved.

Conflict of Interest

Authors declare no conflict of interest.

REFERENCES