⛽️ 🌖 🧙🏻 AI algorithms and automation of clinical coding as an example 🤲🏽 ✅ 👃🏻

Clinical coding is an administrative process in which the data obtained during diagnosis are translated (today, manually using reference books and manuals) into the corresponding code symbols. The sources of clinical data include:

Data on admission.
Data at discharge.
Pathological studies.
Radiological studies.
Recipes

A mistake in coding is a frequent thing and has unpleasant consequences (from staff re-processing and reduction of hospital funding to loss of control over the epidemic), more on this below.

Graphically, the coding process can be represented in Fig. 1.

Fig. 1 - Clinical coding process

ICD-10 is a unified coding standard used in many countries of the world. The abbreviation ICD-10 stands for "10th edition of the International Classification of Diseases and Other Health Problems", compiled by the staff of the World Health Organization. The document contains code representations of various diseases, their symptoms and signs, deviations from the norm, as well as complaints, social circumstances and external causes of injuries and various diseases.

As a rule, each code consists of 7 characters: 1-3 characters are used to indicate the category of the disease, 4-6th characters determine the location and severity, the 7th character is complementary. In some countries, the designation of codes may vary. In the near future, a transition to the new ICD-11 standard with more bulky disease codes is expected. The document will contain over 55,000 codes with the addition of designations of some new clinical cases and mental illnesses. Understanding the new code representations and classifications is extremely important for countries, individual territories and health organizations to further develop the industry and attract adequate funding.

Two important applications of clinical coding:

Billing (local government and state government, health and insurance).
Reporting (epidemiological studies, state policy, epidemiological surveillance).

Clinical encoders carefully check all medical records for medical care to determine the following:

The main diagnosis.
Secondary treatment (if performed).
Identified other diseases.
Complications that have arisen.

All of the above is displayed in the corresponding codes according to the ICD-10 standard.

Clinical Coding Issues

Manual execution of the coding process is associated with various difficulties, and in general causes a lot of trouble for employees of institutions:

, : , - , .
, 4 .
. 8- 24 , .
, 70-75%. 1 , (AHIMA). , .
The ratio of speed and coding accuracy . These two parameters are interrelated: the higher the speed, the lower the quality and vice versa.
Lack of staff . Only about 52% of clinical encoders work on an ongoing basis. Many agencies use offshoring to reduce the number of pending cases.

Table 1: II National Clinical Coding Competition ICD-10

The consequences of clinical coding errors

Errors in classification and clinical coding are very common. They affect many aspects of the work of medical institutions, including the payment of the cost of medical care provided. Consider an example with appendectomy (appendix removal), the most common option for emergency surgery. Incomplete or incorrect code representation of the transaction significantly affects the financing.

Example: a patient was admitted with a diagnosis of acute appendicitis. In the postoperative period, wound infection developed. The patient was prescribed antibiotics intravenously.

Table 2. The effect of coding errors in the case of acute purulent appendicitis on financing.

An example shows that a clinical coding error can lead to over-processing and reduced funding. Another serious consequence of incorrect clinical coding is the loss of control over the development of epidemics.

How practical is it to use AI algorithms for clinical coding?

If AI can drive like a human, can it handle clinical coding?

Over the past few years, significant success has been achieved in the application of AI in various fields of activity. A small excursion into the subject:

AI is a vast area of knowledge about computers that can mimic human capabilities. It allows machines to use data for training, eliminating the need for hard coding to perform specific tasks. AI allows computers to learn using their own experience. Computers are capable of processing large amounts of data and notice deeper connections, ultimately providing a higher level of accuracy compared to humans. All this is the basis for more accurate results, which are the basis for more informed decisions.

Despite the many difficulties that AI faces in the healthcare industry, it can play a key role in clinical coding, providing some undeniable advantages:

Lower financial costs.
Better consistency.
Elimination of staff shortages.
Implementation of pre-clinical coding.
Speeding up the process, which in turn will lead to faster financing.
Improving the accuracy and scope of audits.

The problem of medical data complexity

Many health facilities and organizations do not use a conceptual approach to organize and manage data quality, especially in the long run. The value of medical records and data based on them grows with time. Even the introduction of electronic medical records (EMR) has not simplified the processing of real-time data in an adequate manner, because the functionality of the software used is very limited.

Here are the main problems with processing medical data:

Different quality levels of electronic medical records.
Lack of compatibility, as well as the complexity of clinical systems.
The complexity of the process of collecting, searching and analyzing data.
The need to process incomplete or missing data.
Coverage and data sampling.
Regulatory requirements and bureaucratic processes.

Now let's study

Case of Maharaj Nakhon Hospital in Chiang Mai

This is a training hospital at the University of Chiang Mai, located in the Muang region of Chiang Mai in the province of Chiang Mai. This is the first Thai hospital outside of Bangkok, opened it in 1941. This rather large hospital has 1,400 beds, 69 beds in the intensive care unit and 92 additional beds, as well as 28 operating rooms. Over the year, there are over 45,000 inpatient cases, including over 1,000 open heart surgeries and over 40 kidney transplant surgeries. I register more than 1.3 million patients in the clinic's hospitals.

Data complexity

We use clinical data from the repositories of Chiang Mai Hospital, recorded between 2006 and 2019. Table 3 contains some statistics that demonstrate the complexity of the information being processed.

Table 3. Statistics of the data set of the Maharaj Nakhon Chiang Mai hospital.

In this article we will not go into specific details and pay attention only to the most significant points:

In 42.5% of cases of medical care, a unique set of codes was used (only a few cases with identical records)
Inpatient cases are significantly more complex
Quite complicated cases of outpatient observation (no medical history)
Complex sets of codes (100 or more) are used in more than 70% of cases, as indicated in Fig. 2.

Fig. 2. The frequency of the 30 most common ICD-10 codes in the set of stationary data

Fig. 2 shows the so-called “long tail” problem in the 30 most common ICD-10 codes. As you can see, the vast majority of codes are quite rare. This feature complicates machine learning, since the likelihood of modeling less frequent cases is lower.

Ways to process data sources

Each data source has the following features: format, type, level of difficulty. Because of this, it is difficult to pre-process the data, and there is a problem in the formation of significant predictive signals. Further it will become clear that the stages of data processing and modeling are associated with a complex of equally complex tasks that need to be solved.

Table 4 - characteristics of data sources and the complexity of their processing

Pre-processing of data was carried out in relation to various sources. For example, for processing, unstructured text data (radiological reports or other) were used, semi-structured laboratory data (in various formats, including text, numerical mixed data), structured recipes and tabular data on patient admissions.

Automation Tasks

Due to the complexity of data processing, as shown above, automation of the clinical coding process faces a number of different problems:

A huge number of unique classifiers (over 12,000).
Lack of benchmark or gold standard.
Lack of publicly available datasets.
Unbalanced data (many rare cases).
The difficulty of finding ways to combine data from several different sources.

Expediency of using deep learning algorithms (AI)

Deep Learning is one of the most justified approaches for automating clinical coding processes.

Again a little excursion: deep learning is a family of machine learning methods based on neural networks with high representative learning capabilities. This is a set of algorithms that mimic the work of the human brain, namely: how it passes requests through various hierarchies of concepts and related issues to find a solution to the problem. Deep learning has already been successfully used in various fields: image processing and computer vision, natural language processing (NLP), machine translation, autopilot system, fraud detection system and others.

The appropriateness of using machine learning algorithms is due to the following:

.
.
.
( ).

This section discusses some architectures used to design ICD-10 predictive coding models. First of all, we formulate the problem of classification by several labels for predicting ICD-10 codes. To predict the probabilities of each ICD-10 code, we use the architecture of a neural network of direct communication. Next, the correspondence of the predicted ICD-10 codes with the most probable values will be established.

The intuitive modeling architecture is to collect all available data from various sources and train a single network. This will reflect the interactions between different types of data and their relationship with the final diagnosis. This modeling architecture is called the combined model, which will be used in the section with the results.

Fig. 3 shows the graphic structure of a combined model. Since several data sources are used at once, this architecture cannot be considered the best. Since data sources differ in their complexity, this leads to the construction of an overly complex network with fine-tuning hyperparameters through many iterations, as well as experimentation with a different number of layers and loss functions. Thus, the modality of the data will not be studied well enough.

Fig. 3. The structure of the combined model

The second architecture contains several networks that learn how to interact with individual data sources, as shown in Fig. 4. Then, the obtained forecast data of each network are aggregated using averaging methods or weighted average values. This leads to the lack of dominance of representative or smaller representations of data from different sources in the space of attributes in the learning process. However, this negatively affects the adoption of the right decisions, since the direct selection of one source based on the late merging of knowledge after receiving an opinion from each data source is less informative.

Fig. 4. The structure of the averaging model

Therefore, we turn to the architecture of ensemble modeling, shown in Fig. 5. The structure of the model should be such that it allows you to reliably determine the various modalities of the data with their different levels of complexity, as well as thoroughly examine the relationships established between them. Our network, built on top of individually trained models, is called “ensemble” or “expert”. She imitates the work of clinical encoders, uses all types of clinical data, makes decisions regarding the final diagnosis.

In fact, the network will receive expert knowledge from already trained networks, which is more effective than studying individual sources. The ensemble network will draw on the experience of each specialist (pathologist, radiologist, pharmacist and others) over many iterations, gaining the necessary knowledge for making a diagnosis. In addition, she has the ability to formulate new diagnoses, receiving predictive data from individual networks, and not just take into account the prediction with the highest weight coefficient based on any one source.

Fig. 5. The structure of the ensemble model

Preliminary results

This section presents the measures used to quantify the accuracy of the models described above, as well as experimental results.

Evaluation Measures

Unlike binary and multiclass classifications, evaluating the effectiveness of classification by several criteria depends on which of these criteria are correct. To check how the model will behave in various situations, they use different approaches to checking the results to identify errors caused by insufficient or excessive coding. In view of the foregoing, the following assessment measures are used:

Medium Accuracy - The weighted average accuracy for each threshold value obtained by summing the values on the accuracy return curve.
Coverage error - a value characterizing the duration of a ranking assessment sufficient to cover all labels.
– y_score, , .
F1 – .
– , .
– , .

Table 5 shows a gradual improvement in the overall performance of the model for all key performance indicators. Quantitatively, this translates into a 4–5% improvement for the inpatient treatment dataset and a 2–3% improvement in the processing of outpatient data. Different sources make various contributions to the accuracy of the model. For example, data taken from prescriptions are the most informative. For each source, a model of a certain complexity is used, and a different amount of time and iterations are required for an exact study. Deep networks are able to find the optimal minimum in some data modalities faster than others. Therefore, to improve accuracy, they use the training method of each modality separately to encode the variability levels of data complexity to the fullest.

On the other hand, the presented model is capable of achieving accuracy of a person’s level in primary diagnostics, especially when working with data from a hospital. This is important for various applications of clinical coding, for example, for billing, based primarily on the correct diagnosis.

Table 5. Automated coding accuracy

Table 6 presents 5 major diseases sorted by degree of accuracy. Accuracy for the first three categories of hospital care data is over 90%. Regarding cases associated with the detection of neoplasms in patients (about 30% of the data), a very encouraging accuracy of about 80% was obtained. Despite the lower performance indicators of the model for outpatient data, the accuracy still exceeded 60% (about 65% on average), which in itself is a big step forward.

Table 6. Model accuracy for the 5 most common high-level diagnostic cases

Model performance self-awareness

The construction and evaluation of the effectiveness of machine learning models is carried out in the process of their training / evaluation. For the assessment using data selected at random. However, assessing the accuracy of current forecasts in real time is very difficult. To solve the problem, a criterion is introduced that evaluates how confident the model is in its own forecast. For example, it will be useful to know that the accuracy of the model is appropriate for simple medical care cases and insufficient for complex medical cases. This could serve as a signal for rechecking a particular case by a person manually.

We propose a confidence assessment model in combination with an ICD-10 code prediction model. In Fig. Figure 6 shows a validation assessment network. We conduct a training process to detect inconsistencies between the predicted and actual codes, taking into account all the input data. So, the model is able to evaluate the reliability of the forecast taking into account the initial data, the degree of complexity of a particular case, and the likelihood of obtaining “good” and “bad” forecasts.

Fig. 6. The structure of the model for assessing the degree of reliability

Table 7 contains the results of testing a network of confidence ratings for various data from the set. So, each forecast contains an assessment of its reliability. For example, the forecast accuracy over 97% is observed in 3% of cases, 85% - in 50% of cases. Reliability assessment allows you to automate the process of attracting third-party assistance when it is required. The presented model is characterized by self-awareness, is easily launched and evaluated by users in real time.

Table 7. Reliability of an estimation of various data sets

Key features:

Ensemble modeling, combined with an expert network to select the best forecast, is superior to other modeling methods.
, , , .
4% .
( ), 1%.
, ,
, .
80% 50% ( , ).
, ( ).
, .

,

The result can be the basis for creating a number of applications that contribute to the further development of the healthcare sector. At the moment, there are many programs for the automation of clinical coding: real-time analytics, cost forecasting, logistics and staff planning, and others. We offer highly specialized software solutions for predicting clinical coding:

Decision Support System
Applications specializing in automation of the clinical coding process include decision support systems based on predictive models that have the following capabilities:

Software tools for the work of clinical encoders.
.
QA- - .
.
, .

A clinical audit provides verification of the correctness of coding and its compliance with established criteria. The results of the audit are used to analyze the work of healthcare institutions, compile reports, and develop strategies to increase its effectiveness. The development of accurate and high-quality audit strategies pays particular attention both locally and internationally. However, at the moment this process is performed mainly manually, which is why a large number of common errors are associated. Coding automation can be effective in this area, providing assistance in:

Conducting scheduled and periodic audits.
Improving accuracy and performance.
Identification of suspicious patterns and trends.
A more accurate understanding of the coding process and the competence of encoders.
, .

This article sheds light on the features of clinical coding in the field of healthcare and shows the effectiveness of the automation of this process. Among the range of architectures presented, the ensemble model of deep learning is best suited for this task. It is able to successfully apply data from various sources, has good prospects for further development and increased accuracy by adding new data sets for analysis. It uses, processes, and models data in various categories, including unstructured, semi-structured, and structured tabular data. Since the area of clinical coding is very sensitive to errors, an additional system is used to automatically evaluate the accuracy of forecasts in real time.

We quantified the models using the database of the Maharaja Nakhon Hospital (Chiang Mai), demonstrating their enormous potential in real clinical coding practice. Models went through the learning process without knowing the final results, which is another advantage. Therefore, they are able to perform consistent and continuous prediction of ICD-10 codes based on new sources of clinical data until patient discharge. This feature provides the ability to inform about the current diagnostic picture in real time. These models are capable of learning on the fly as new medical records arrive.

Further perspectives

We are only at the initial stages of the development of clinical coding automation systems and are opening new horizons for introducing this service to a huge number of healthcare institutions. We are able to provide assistance in the construction of decision support systems and demonstrate their benefits, as well as integrate solutions into modern processes and systems.

AI algorithms and automation of clinical coding as an example