Artificial Intelligence Watch Complaints


Any financial organization is a living organism and the processes in it are imperfect. Process imperfections give rise to customer dissatisfaction, which can even transform into complaints. In this article, we will talk about our contribution to the automation process by implementing a small Machine Learning project.

You can try to solve any difficult task using simple methods, and Machine Learning is no exception.

Feedback is the most valuable information, so you should study as much as possible every bit of it. Analyzing complaints from customers, we objectively see in which business processes problems arise. Since processes are often interconnected, they can be grouped and considered as a group. Accordingly, we come to the standard task Machine Learning (ML) - “multiclass classification”. As a result of this analysis, the task is solved - the collection of consolidated analytics for the organization.

The classification task is a task in which there are many objects divided into classes in a certain way. In the case of multiclass classification, the number of classes must be more than 2 and can even reach many thousands.

Data on complaints and responses to them is stored on the server, the process of unloading and pre-processing is carried out as a standard through a request to the database, at the output we get a data frame with the data with which we will work. The complaint and the answer to it are quite lengthy documents. For example, the response to a complaint can reach hundreds or even thousands of words. Processing such text directly is very costly (computationally), which is why text preprocessing is necessary.

def review_to_wordlist(review):
       		review_text = re.sub('[^--]',' ', review)
       		words = review_text.strip().lower().split()
       		words = [w for w in words if not w in stop_words] 
       		words = [morph.parse(w)[0].normal_form for w in words]
       		new_stop_words = find_names(words)
       		words = [w for w in words if not w in new_stop_words]
       		return words

Accuracy according to the model substantially depends on the uniqueness of the text, therefore it is necessary to remove words from the text that do not carry much meaning - “stop words”. Usually the composition of words includes prepositions, conjunctions, and other insignificant parts of speech. We also supplemented the stop words dictionary with first and middle names.

def find_names(words, prob_thresh = 0.4):
    words = [w for w in words if 'str' in str(type(w))]
    add_stop_words = [w for w in words for p in morph.parse(w) if 'Name' in p.tag and
    p.score >= prob_thresh]
    stop_words.update(add_stop_words)
    return stop_words

Prior to the implementation of the project, classification was carried out manually, therefore, we have data marked up by experts. And this is a classic ML teaching task with a teacher. The pre-processed text is reduced to a view that the model can process. To do this, we translate the responses to complaints into feature vectors (the independent variable used in the code is features, the dependent variable is labels).

tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='utf8',  
    ngram_range=(1, 2), stop_words=stop_words)
features = tfidf.fit_transform(df_temp['Consumer_complaint_narrative'])
labels = df_temp['Num_bp']

Linear Support Vector Classification is selected for classification. This was done for the following reasons:

  • high efficiency when working with measurements of large dimension;
  • stable work in case of excess of dimension over the number of samples.

This project has been prepared for implementation in the prom. Every day, the model will carry out the classification of data entered during the working day. At the initial stage, an additional manual verification of the model’s work by an expert is supposed. Once a month, the model will be retrained. The implementation of this project allowed us to take one step closer to the future!

All Articles