COVID-19 Telegram-bot // We answer FAQ questions automatically

In the context of universal hype on Coronavirus, I decided to do at least something useful (but no less hype). In this article I’ll talk about how to create and deploy the Telegram Bot using Rule-Based NLP methods in 2.5 hours (that’s how much it took me) to answer FAQ questions using the COVID-19 case as an example.

In the course of work, we will use the good old Python, Telegram API, a couple of standard NLP libraries, as well as Docker.



UFO Care Minute


The pandemic COVID-19, a potentially severe acute respiratory infection caused by the SARS-CoV-2 coronavirus (2019-nCoV), has officially been announced in the world. There is a lot of information on Habré on this topic - always remember that it can be both reliable / useful, and vice versa.

We urge you to be critical of any published information.


Official sources

, .

Wash your hands, take care of your loved ones, stay at home whenever possible and work remotely.

Read publications about: coronavirus | remote work

Brief Preface


This article describes the process of creating a simple Telegram Bot answering FAQ questions on COVID-19. Development technology is extremely simple and versatile, and can be used for any other cases. I emphasize once again that I do not pretend to be State of the Art, but only offer a simple and effective solution that can be reused.

Since I believe that the reader of this article already has some experience with Python, we will assume that you already have Python 3.X installed and the necessary development tools (PyCharm, VS Code), you can create a Bot in Telegram via BotFather, and therefore, I will skip these things.

1. Configure API


The first thing you need to install is the wrapper library for the Telegram API " python-telegram-bot ". The standard command for this is:

pip install python-telegram-bot --upgrade

Next, we’ll build the framework of our small program by defining “handlers” for the following Bot events:

  • start - Bot's launch command;
  • help - help command (help);
  • message - text message processing;
  • error - an error.

The signature of the handlers will look like this:

def start(update, context):
    #   
    pass


def help(update, context):
    #  
    pass


def message(update, context):
    #  
    pass


def error(update, context):
    # 
    pass

Next, by analogy with the example from the library documentation, we define the main function in which we assign all these handlers and start the bot:

def get_answer():
    """Start the bot."""
    # Create the Updater and pass it your bot's token.
    # Make sure to set use_context=True to use the new context based callbacks
    # Post version 12 this will no longer be necessary
    updater = Updater("Token", use_context=True)

    # Get the dispatcher to register handlers
    dp = updater.dispatcher

    # on different commands - answer in Telegram
    dp.add_handler(CommandHandler("start", start))
    dp.add_handler(CommandHandler("help", help))

    # on noncommand i.e message - echo the message on Telegram
    dp.add_handler(MessageHandler(Filters.text, message))

    # log all errors
    dp.add_error_handler(error)

    # Start the Bot
    updater.start_polling()

    # Run the bot until you press Ctrl-C or the process receives SIGINT,
    # SIGTERM or SIGABRT. This should be used most of the time, since
    # start_polling() is non-blocking and will stop the bot gracefully.
    updater.idle()


if __name__ == "__main__":
    get_answer()

I draw your attention to the fact that there are 2 mechanisms how to launch a bot:

  • Standard Polling - periodic polling of Bot using standard Telegram API tools for new events ( updater.start_polling());
  • Webhook - we start our server with an endpoint, to which events from the bot arrive, it requires HTTPS.

As you already noticed, for simplicity we use the standard Polling.

2. We fill in standard handlers with logic


Let's start with a simple one, fill in the start and help handlers with standard answers, we get something like this:

def start(update, context):
    """Send a message when the command /start is issued."""
    update.message.reply_text("""
!
        COVID-19.
:
- *  ?*
- *  ?*
- *   ?*
 ..
 !
    """, parse_mode=telegram.ParseMode.MARKDOWN)


def help(update, context):
    """Send a message when the command /help is issued."""
    update.message.reply_text("""
     (  COVID-19).
:
- *  ?*
- *  ?*
- *   ?*
 ..
 !
    """, parse_mode=telegram.ParseMode.MARKDOWN)

Now, when the user sends the / start or / help commands, they will receive the answer prescribed by us. I draw your attention to the fact that the text is formatted in Markdown

parse_mode=telegram.ParseMode.MARKDOWN

Next, add error logging to the error handler:

def error(update, context):
    """Log Errors caused by Updates."""
    logger.warning('Update "%s" caused error "%s"', update, context.error)

Now, let's check whether our Bot works. Copy the whole code written in a single file, for example app.py . Add the necessary imports .

Run the file and go to Telegram ( do not forget to insert your Token into the code ). We write the commands / start and / help and rejoice:



3. We process the message and generate a response


The first thing we need to answer the question is the Knowledge Base. The simplest thing you can do is create a simple json file in the form of Key-Value values, where Key is the text of the proposed question, and Value is the answer to the question. Knowledge Base Example:

{
  "      ?": "  —  .     -     ,     ,     ,        .    ,      ,   .        ,     .",
  "   ?": "  :\n     \n    \n    \n     \n\n         ,    .",
  "  ?": " :\n- (    , , )\n- (  )",
  }

The algorithm for answering the question will be as follows:

  1. We get the text of the question from the user;
  2. Lemmatize all the words in the user's text;
  3. We do not clearly compare the resulting text with all the lemmatized questions from the knowledge base ( Levenshtein distance );
  4. We select the most “similar” question from the knowledge base;
  5. We send the answer to the selected question to the user.

To implement our plans, we need libraries: fuzzywuzzy (for fuzzy comparisons) and pymorphy2 (for lemmatization).

Create a new file and implement the sounded algorithm:

import json
from fuzzywuzzy import fuzz
import pymorphy2

#   
morph = pymorphy2.MorphAnalyzer()
#  
with open("faq.json") as json_file:
    faq = json.load(json_file)


def classify_question(text):
    #  
    text = ' '.join(morph.parse(word)[0].normal_form for word in text.split())
    questions = list(faq.keys())
    scores = list()
    #      
    for question in questions:
        #    
        norm_question = ' '.join(morph.parse(word)[0].normal_form for word in question.split())
        #       
        scores.append(fuzz.token_sort_ratio(norm_question.lower(), text.lower()))
    # 
    answer = faq[questions[scores.index(max(scores))]]

    return answer

Before writing a message handler, we will write a function that saves the history of correspondence in a tsv file:

def dump_data(user, question, answer):
    username = user.username
    full_name = user.full_name
    id = user.id

    str = """{username}\t{full_name}\t{id}\t{question}\t{answer}\n""".format(username=username,
                                                                 full_name=full_name,
                                                                 id=id,
                                                                 question=question,
                                                                 answer=answer)

    with open("/data/dump.tsv", "a") as myfile:
        myfile.write(str)

Now, use the method we wrote in the message text message handler:

def message(update, context):
    """Answer the user message."""
    # 
    answer = classify_question(update.message.text)
    #  
    dump_data(update.message.from_user, update.message.text, answer)
    # 
    update.message.reply_text(answer)

Voila, now go to Telegram and enjoy the writing:



4. Configure Docker and deploy the application


As the classic said: “If you execute, then it’s beautiful to execute.”, So that we have everything as people, we’ll configure containerization using Docker Compose.

For this we need:

  1. Create Dockerfile - defines the image of the container and the entry point;
  2. Create docker-compose.yml - launches many containers using a single Dockerfile (in our case it is not necessary, but in case you have many services, it will be useful.)
  3. Create boot.sh (the script is responsible directly for launching).

So, the contents of the Dockerfile:

#
FROM python:3.6.6-slim

#  
WORKDIR /home/alex/covid-bot

#  requirements.txt
COPY requirements.txt ./

# Install required libs
RUN pip install --upgrade pip -r requirements.txt; exit 0

#      
COPY data data

#   
COPY app.py faq.json reply_generator.py boot.sh  ./

#   
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# 
RUN chmod +x boot.sh

#  
ENTRYPOINT ["./boot.sh"]

The content of docker-compose.yml:

# docker-compose
version: '2'
#  
services:
  bot:
    restart: unless-stopped
    image: covid19_rus_bot:latest
    container_name: covid19_rus_bot
    #    boot.sh
    environment:
      - SERVICE_TYPE=covid19_rus_bot
    # volume      
    volumes: 
        - ./data:/data

The contents of boot.sh:

#!/bin/bash
if [ -n $SERVICE_TYPE ]
then
  if [ $SERVICE_TYPE == "covid19_rus_bot" ]
  then
    exec python app.py
    exit
  fi
else
  echo -e "SERVICE_TYPE not set\n"
fi

So, we are ready, in order to start all this you need to execute the following commands in the project folder:

sudo docker build -t covid19_rus_bot:latest .
sudo docker-compose up

That's it, our bot is ready.

Instead of a conclusion


As expected, all code is available in the repository .

This approach, shown by me, can be applied in any case for answering FAQ questions, just customize the knowledge base! Regarding the knowledge base, it can also be improved by changing the structure of Key and Value to arrays, so each pair will be an array of potential questions on one topic and an array of potential answers to them (for a variety of answers, you can choose randomly). Naturally, the Rule-Based approach is not too flexible for scaling, but I am sure that this approach will withstand a knowledge base with about 500 questions.

Those who have read to the end I invite you to try my Bot here .

All Articles