Save time, nerves and man hours

Our projects are usually regional, and customers, usually ministries. But, in addition to the public sector, private organizations also use our systems. There are practically no problems with them.

So, the main projects are regional, and sometimes there are problems with them. For example, with performance, when in regions from 20k our precious users are in the period of rolling out new functionality to product servers. It’s a pain ...

My name is Ruslan and I am involved in escorting the BARS Group information systems and developing a killer bot for brutal serial DBAs . Fasting is not for the faint of heart - many letters and pictures.



/ awr


Some of our applications run on Oracle DBMS. There are projects on the PostgreSQL DBMS. Oracle has a wonderful thing - collecting statistics on the load on the DBMS, which highlights the existing problems and even gives recommendations for elimination - Automatic Workload Repository (AWR). At one point (namely, at the time of pain), the developers constantly asked to collect AWR reports for performance analysis . We honestly went to the DBMS server, collected reports, dragged them to us and sent them for analysis to production. Once after the 5th it began to strain ... after the 10th - to cause irritation ...

One day, one of my colleagues expressed the idea that everything that is done more than once should be automated. Until the moment of annoyance, to be honest, I did not think about it and tried to automate everything that can be automated, but often it was not in demand and was more research than applied.

And then I thought: “Admins are not needed to generate a report ...” . After all, to collect a report is to execute the sql script @ $ ORACLE_HOME / rdbms / admin / awrrpt.sql and pick up the report from the server to ourselves ... Oh yes, we don’t let the development go on sale.

Then I googled the necessary information, created a function from the article on a test basis, pulled the script and a miracle - the report was collected and it can be saved locally. Created functions where AWR reports were often needed, told developers how to use it.

Around this time, in my spare time, after talking with @BotFather, I created a Telegram bot for myself, just for fun. I screwed up a simple functional there - to show the current time, exchange rates, weather, taught him to send compliments to my wife (then a girl) on a schedule. Perhaps at that time sending compliments was the most sought-after functionality of my bot, my wife appreciated.

So. The developers write to us in Telegram, we send a report to them in Telegram ... But what if they write not to us, but to the bot? Indeed, it will be better for everyone, the report will be received faster, and most importantly, by us. So the idea of ​​the first sought-after functionality was born for my bot.

I started implementation. I did it, as I managed, in PHP (actually our application is in PHP, I understand it more than Python). The encoder is so-so from me, so I won’t show my code :) The

bot lives in our corporate network, it has access to certain projects, including target databases. In order not to bother with the parameters in the command or the menu, I screwed this functionality to a group chat with monitoring notifications. So the bot immediately knows which database to collect the report on.

Having received a command of the form / awr N, where N is the number of full hours for which a report is needed (by default - 1 hour), even for a week, if the database does not restart, the bot immediately starts working, collects the report, publishes it as a web page and right there (almost right there) gives a link to a much-needed report.

We follow the link and here it is, the AWR report:



As expected, the developers coped with this report generation, someone even thanked.

Having appreciated the convenience of the team, project managers from other regions wanted the same, because they most of all get luli from the customer who worry about the performance and availability of the systems. Added a bot to other chats. They still use it, and I'm glad about it.

Later, colleagues from CIT wrote about how we collect reports, they also wanted to. I did not add them to our chats, I created a separate chat with the generation of reports on a schedule and on request.

/ pgBadger


We have other applications in PHP in conjunction with PostgreSQL. Implemented the collection of the pgBadger report for those in need on the same principle - in group chats. At first they used it, but then they stopped. Functional cut out as unnecessary.

/ duty


In our department there are night shifts and, accordingly, there is a schedule. It is in google tables. It’s not always convenient to look for a link, open a schedule, look for yourself ... One of the former colleagues also played with his Telegram bot and introduced into the chat of our department notifications about the start of a shift on duty for employees of the department.The bot parses the schedule, determines the attendant by the current date and, according to the schedule or by request, reports who is on duty today. It turned out great, comfortable. True, I did not really like the message format. Also, for employees of another department (for example, Business Center “Medicine”), information on duty in other areas is not really needed, but you need to know who is on duty in “Medicine” in case of problems. I decided to “borrow” the functionality, but to change what I did not like. I made a message format convenient for myself and for others, removing redundant information.

/ tnls


After the “pen test” of automation through the Telegram bot, many different ideas appeared, but I wanted to do the strictly necessary things. I decided to keep statistics on appeals. To access the projects of our customers, we have implemented the so-called “hop server” or the forwarding server. VPN connections are raised on it, then through ssh, application ports, databases and other auxiliary probes are forwarded to our local network for convenience of access to the projects of our employees, without troubles with VPN connections. It’s enough to set up a VPN connection to our corporate network.

The statistics of calls suggested that often, after the fall of one of the tunnels (in case of network problems, by timeout, for example), they are turning to restore access to the project. In most cases, just restarting the connection is enough and everything becomes fine. Let’s do it yourself. Here is the command:


“Fall through” to the desired menu item, select your project, wait a minute and everyone is happy and satisfied ...

When a command is received, by a slight movement of the bytes and bits, the bot connects to the forwarding server, knowing in advance which forwarding needs to be restarted, and does his job - restores connection to the project. I wrote instructions to solve such issues on my own. And we were contacted only if the provided tool does not work ...

/ ecp_to_pem


Further statistics showed that it is often necessary to convert the Crypto Pro digital signature to the pem format ( Base64 ) for various integrations, and we have a lot of them. Task: you take the container, copy it to a Windows computer with the P12FromGostCSP utility installed (by the way, paid), convert it to pfx, and convert pfx using OpenSSL (with support for GOST encryption) to pem. Not very convenient, but I want it at the click of a finger.

Google again came to the rescue. Found the utility of some kind person . Collected, as written in README - earned. He taught the bot to work with the utility and got almost instant conversion.


By the time of the final implementation, an order was issued to switch to a new encryption format - gost-2012. As far as I remember, the utility at that moment worked only with the old GOST (2001), maybe it was generally another similar utility of another good person, I don’t remember exactly.
After the transition to the new GOST, the bot’s functionality was removed for security reasons. Implemented it in a docker container.

Dockerfile, suddenly who needs:
FROM ubuntu:16.04                                                                                                                                                                        
RUN apt update && apt -y install git sudo wget unzip gcc g++ make && \                       
   cd /srv/ && git clone https://github.com/kov-serg/get-cpcert.git && \                    
   cd get-cpcert && chmod +x *.sh && ./prepare.sh && ./build.sh && \                        
   mkdir -p /srv/{in,out} && \                                                              
   echo '#!/bin/bash' > /srv/getpem.sh && \                                                 
   echo 'cd /srv/get-cpcert' >> /srv/getpem.sh && \                                         
   echo './get-cpcert /srv/in/$CONT.000 $PASS > /srv/out/$CONT.pem' >> /srv/getpem.sh && \  
   chmod +x /srv/getpem.sh                                                                  ENTRYPOINT /srv/getpem.sh


To convert, you need to place the source container (a directory of the form xxx.000) in the / srv / in directory, and pick up the finished pem in / srv / out.

To convert:

 docker run -t -i -e CONT='<   ( ".000")>' -e PASS='<  >' -v /srv/in:/srv/in -v /srv/out:/srv/out --name ecptopem <  >/med/ecptopem:latest 

/ emstop and / emstart


Once, our company got a very cool Oracle DBA, with more experience in DBMS administration and development. And it didn’t happen right away with an ssh-connection to the DBMS servers: either it doesn’t know where to connect and how, then the accesses are not clearly described, it’s impossible to forward something to itself. Well, we are happy to help, talked about how to connect, and forwarded Enterprise Manager for it. But with ssh it still didn’t work out. One of the colleagues explained it simply: DBA thoroughbred :) We decided if we need to tighten something on the server, we will do it ourselves.

EM crashes under heavy load, and restarting it ... you need to connect via ssh and restart through the terminal. “Admins know this well,” our new colleague decided. Large loads on the DBMS are not uncommon with us, requests for restarting EM are also frequent. Further the same scenario: tension, irritation and search for a solution to the problem. So in the same group chats appeared the command: / emstop and / emstart.



/ kill


In the case of strong competition at the base, and this sometimes happens, you need to quickly unload the database. The fastest way is to kill the problem process ... To do this, connect via ssh, kill -9 ... The bot will help!



Alexey appreciated the team and gave it an affectionate name - “Kilyalka” or a gun.
Once, after seeing how Alex is trying and suffering, each time entering / kill xxx for each of the processes, I decided to add “multilateralism” to our gun:



That's better! Everything for you, Alex, just work, dear!

Naturally, access to such an important command was limited by user_id - "protection against the fool." Seeing how Lesha cleverly nailed processes on the database server, several people tried to enter a command with a random process number, but you can’t deceive my smart bot, he immediately refused.

/ alertlog


Well, just in case, I made a command:
/ alertlog <number of lines> - get the specified number of lines alertlog'a The
bot pulls alertlog and sends it to our service, such as pastebin, called pyste, and sends a link to the paste to the request chat.

/ checks


Then came a request to monitor the real performance of our application. Until now, technical support for the project has been collecting this data by hand. Not a deal! Our valiant testers have developed test cases for this. The resulting test log is not very convenient to read, an inexperienced user will understand for a long time and not the fact that he will highlight the necessary information. And we don’t like to do with our hands what we can do not with our hands ... A new task for the bot!



The / checks command displays a straightforward and unambiguous menu, this time our guys learned how to use this command without instructions!

When you select the desired item, instead of the menu, a notification appears about the start of the test so that impatient users do not run our test 100500 times:



Depending on the selected menu item, a specific test is launched from our network, namely from the machine where the bot lives (jmeter is pre-configured there, the necessary tests are placed ...) or immediately from the data center (from the prepared machine next to the application), so that when testing eliminate network delays, well, or minimize them.

After completing the test and receiving the log, the bot parses it and displays the result in a “human-readable” form:



Metrics collection


The functionality has “entered” and the project managers who wish have received such a function for their regions. And one compassionate Project Manager said: “I want to have statistics on time!” Someone from CIT told her that it would be convenient to monitor all this in Zabbix. Zabbix, so Zabbix ...

Thought it was necessary to prepare for the need to duplicate the solution ... I designed the idea in a docker container. In the container, according to the schedule (every 10 minutes), jmeter is launched, adds the log in a specific place, php parses it and displays the necessary data in the form of a web page. Zabbix uses the web.page.get key to get this page, regularly selects the necessary data for certain dependent elements and builds a graph.



It seems that it turned out not bad. Observing the graph, we, firstly, see the approximate speed of the application, and if peaks are detected on the graph, we know roughly where the “plug” is. Everything is simple. So far it has turned out to be in demand only for one region, but I am ready to disseminate it for those who wish.

Application development


Statistics on tasks of the same type not so long ago threw up ideas for simplifying and facilitating work. On some projects, on application servers, there is a need to install the key containers of Crypto Pro, there are many of them, and the digital signature will expire over time. Sometimes 2 tasks “flies” per day. But I considered using the bot for these purposes unsafe and decided that I would do the functionality directly in the application. Naturally with authorization and access control. If you have the necessary privileges, an additional menu item will be available for working with digital signatures, installation, removal, viewing information, etc. ... At the moment, the functionality is under development. As it turned out, this is not very difficult, you need to read the instructions a bit, look at code examples, ask more experienced colleagues in development, and there you go.In the process of research, ideas appeared for adding to the application. I will not build Napoleonic plans - there is development, let everyone do their own thing. But for now, it's interesting - I do it myself.

Plans


As I said, many different ideas were born for using our bot and not only - in general, let’s say, ideas for “automation points”, many of them were forgotten, because I did not have time to write them down. Now I try to write down everything that comes to mind, and I recommend that others do the same.

But Alex does not forget to throw his Wishlist. Of the latter:
/ kill_sql SQL_ID - kill all sessions with such an SQL_ID query
/ kill_block - kill the root blocking session
/ show_em - show a performance picture EM
Trick, wants to DBA stitch from the phone =)

That's how we work for the good of the Motherland!

And how do you save yourself from routine and uninteresting tasks?

I hope the reading turned out to be interesting, and maybe even useful to someone, and I did not manage to tire the reader ... Good luck to all of us.

All Articles