Creating a virtual assistant who would cope well with the business task today is, unfortunately, not as simple a process as we would like. First of all, it is far from always obvious why the chat bot makes mistakes and, more importantly, it is not clear how these errors can be minimized in the short time allotted for the development and launch of the platform.

Through continuous product development, the IBM Watson Assistant team is trying to make the process of creating and launching a virtual assistant as simple as possible. Today we talk about the Dialog Skill Analysis Notebook- A framework for Python that allows you to quickly develop a high-quality AI assistant in IBM Watson. It does not matter if you are creating your first chatbot or you are an expert in the field of creating virtual assistants, in any case, this framework will help you if you have questions:

How effective is my chat bot?
How can I measure the effectiveness of an assistant?
Why does the bot answer the questions incorrectly?
How to increase the level of understanding of questions by the assistant?

How it works?

Next, we will show you some examples of tasks that can be solved using the framework. You can try its features yourself by downloading from the GitHub repository . The examples used in the article are given in English, but you can use Russian for training and checking the chatbot.

Note: this material is intended for those who have a basic understanding of creating chat bots on the IBM Watson Assistant platform. If you are unfamiliar with our platform, or would like to learn how to create high-quality virtual assistants based on IBM Watson, we invite you to free training seminars that will be held in Moscow and St. Petersburg in March 2020, including a two-day practical workshop on creating virtual assistants.

Part 1: Training Data Analysis

We will use the test case “Customer Care”, available in the Watson Assistant, in which the chatbot is trained to recognize questions about the store, for example: “Where is your store located?” or "What time does it open?" and assign them to the intents Customer_Care_Store_Location and Customer_Care_Store_Hours

Immediately after loading the script, you can start the analysis of expressions, which will allow you to detect and correct critical errors such as correlation of one word or phrase simultaneously with several intents, which is guaranteed to lead to errors in the process of using the assistant.

Part 2: Conversational Skills Analysis

When you first create a conversational skill, you can test its work using the Try it out panel in Watson Assistant to evaluate the assistant’s ability to predict whether a text belongs to a specific intent.

It is certainly convenient to check if your chat bot works at all or to show an example of its work to the customer. However, to check the quality of the assistant’s work, this approach is completely unsuitable because of the impossibility of automation. Users can ask the same question in dozens of different ways, and even if you can predict all the possible combinations, such a manual check and analysis will take too much time.

Instead, we suggest that you use the second part of our framework, which will help you analyze conversational skills using a test sample that includes additional examples for each of the intents that you must come up with yourself. Uniqueness is mandatory for the elements of this sample - they should not overlap with the examples that the chat bot was trained on, otherwise it will know the correct answers and the check will be meaningless.

Evaluation is carried out according to the following metrics: Accuracy, Precision, Recall and F1-measure.

Consider the Help topic:

A high Recall value of [100%] indicates that the affiliation of test sentences to this intent was recognized absolutely correctly.
The Precision value [66.67%] shows that some test sentences related to other intents were recognized by the model as related to the Help intent . It is necessary to pay attention to this by adjusting the training sample to achieve a higher result
The F1 measure [80%] is a generalized metric considering the values of Precision and Recall reports the overall quality of the model under study.

Part 3. Advanced Analysis

The third part of the framework opens up opportunities for an extended analysis of your dialog solution. Using the functions implemented in it, you can find out why a particular sentence was mistakenly recognized.

Consider an example of visualizing the relative importance of words in a sentence.

Note

, , : Customer_Care_Store_Location, Cancel, Customer_Care_Appointments General_Connect_to_Agent, Thanks, Customer_Care_Store_Hours, General_Greetings, Help

With absolutely correct operation, the assistant should relate the sentence "If you are closed on Sunday, can you slot me in for tomorrow afternoon?" to the Customer_Care_Appointments intent , as the user requests a meeting on Sunday evening. However, at the moment, this offer belongs to the assistant Customer_Care_Store_Hours intent .

Looking at the diagram, it becomes clear that such an answer is justified by the presence in the sentence of the words “closed” and “afternoon” related to the Customer_Care_Store_Hours intent and at the same time this lack of words in the sentence that could indicate that they belong to the necessary intent.
The functionality of the framework allows you to identify keywords in a sentence that the assistant “perceives” as the most important, which means that you can easily determine the cause of the error and correct it.

Conclusion

The examples shown in the article are just a small part of all the features of our new framework. We hope that it will help you speed up and simplify the process of creating a smart assistant.

How to access the framework?

You can download it from the GitHub repository here .

For those who do not want or cannot download the framework or run IPython Notebook, we have created an online version of the framework, available in the IBM Gallery via the link . Such an online version can be launched in the IBM cloud as part of the Watson Studio service.

Analysis of the quality of the chatbot in the IBM Watson Assistant