Information Security Automated incident response platform

Imagine a typical situation center for information security in a large company. In an ideal world, software detects suspicious activity, and the team of "white hackers" begins to tap their hands on the keyboard. And this happens once a month.

In the real world, these are hundreds of false positives and tired support staff. They are forced to deal with each incident when the user has forgotten the password, can not download the game from a torrent, another porn movie in * .exe format, watch the network failures and generally investigate many situations.

SIEM systems help organize and correlate events from sources. And they generate positives, each of which needs to be dealt with. Of these "everyone", the majority are false. You can approach the issue on the other hand, having scripts for alarm processing. Each time something works, it would be nice to have not just the cause of the alarm, and then climb into four to five systems for different data, and immediately automatically collect the entire diagnosis.



We made such an add-on, and it helped a lot to reduce the load on operators. Because the scripts for collecting information are immediately launched, and if there are typical actions, they are immediately taken. That is, if you start the system “in such a situation, we do this and that”, then the card will open for the operator with the situation already worked out.

What is wrong with SIEM systems?


The list of offsets is heavily overloaded with raw data. A specific incident card for filling is transferred to our platform.

Typical cases have sample reaction flowcharts.

Here, for example, data analysis reports about user authentication errors:



There are criteria for false positives: for example, I tried it twice - I went in from the third. The user just receives a letter saying that the password must be typed carefully, the ticket closes. If he tried five or six times, then detailed information is already beginning to gather: what happened next, what happened before, and so on. If he logged in the 10th time, and then went to the knowledge base and started downloading 10 files, then there may be a setting to “block access to the knowledge base until the end of the proceedings and notify the operator”. Most likely, if the user is not malicious, in this case the IT department will automatically receive an email with the details. Perhaps they will teach the user to enter the password correctly or help to change it.

If the activity is more dangerous, the level “opened the executable file in the mail, and then something began to creep on the Web”, then an entire segment or subnet may be automatically blocked. Yes, SIEM can do this on its own, but without fine-tuning, perhaps, such measures are the limit of automation.

Again, in an ideal world, the operator has access to all systems and immediately knows what to do. In the real world, he often needs to find someone responsible to clarify something. And he is also on vacation or at a meeting. Therefore, another important part - in the reaction flowchart should immediately be responsible for specific sections of systems and departments. That is, you need not to look for the employee’s cell phone, the name of his boss and his phone, but to immediately see them in an open card.

What have we done


  1. , (), - .
  2. , ( ), , .
  3. , -. : , , .
  4. , .
  5. .
  6. ( , , ).
  7. .
  8. GUI -.

One of our major customers is SIEM QRadar. A good threat detection system, there are actions and steps for each incident, but you cannot give a list of work for a human operator. When it comes to a professional superclass, this is not necessary. When it comes to the operator of the first line, it is very important to give him instructions on what and how to do, and he will be able to cover most of the typical incidents at the level of a cool specialist.

That is, we took out all the obviously boring events on the first line and added criteria to the scripts that separate boring from boring ones. Everything atypical, as before, falls on the pros.

Cases for companies of several tens of thousands of workplaces and with their server capacities in several data centers were worked out and prescribed as a result for about a year (there are difficult relationships between departments, which made integration into different systems difficult). But now any sub-task in the card has a specific person in charge, and it is always relevant.

The simplicity for the operators can be judged by the fact that upon implementation, the system was first rolled out into regions, and then, after a couple of weeks, official documentation began to be sent. So during this time, people have already begun to confidently close incidents.

How did it start?


There is SIEM, but it is not clear what is constantly happening. More precisely, QRadar generates a lot of events, they fall into the information security department, and there simply are no hands to disassemble everything correctly and in detail. As a result, reports are simply viewed superficially. The benefit from SIEM with this approach is not very high.

There is an asset management system.

There are servers for scanning the network, very well configured.

The report was going to be excellent, but they looked at him tiredly and put off.

The customer wanted what they bought to start producing results.

We put the service desk for security guards on top (actually a ticket system, as in normal support), visualized data analytics, and wrote the described automation platform on the basis of IBM Resilient + added typical reactions. Resilient comes naked, it's just a framework. We took the correlation rules from QRadar as a basis and finalized response plans for user cases.



For several months they did Russification of everything and hung up the correct bundles by API. As soon as we finished, the vendor issued Russification, and we were a little sad.

About a month they trained and acquainted with the documentation (in particular, how to draw new flowcharts for cards). The further they learned, the more simple cases became: at first huge scripts of actions were written, and then it turned out that they became a kind of library of typical cases. And one could refer to them in almost any reaction.

Reaction comparison


Incident "Repeated virus infection with the same malware in a short period of time." That is, the virus is detected at workstations, but personnel are needed to understand where it gets from. The source of infection is active.

Classic:

  1. There was a repeated virus infection of the host 192.168.10.5 with the same malware for a short period of time, events were sent to SIEM, and the corresponding rule worked.
  2. .
  3. .
  4. .
  5. .
  6. /CMDB-.
  7. .
  8. - , .
  9. / .
  10. Service Desk .
  11. Fills out an application in the Service Desk system based on the results of an investigation to eliminate the vulnerability due to which this host was infected.
  12. He waits for service Desk applications to be closed, and then checks their execution.
  13. Fills out an incident card and closes the incident.
  14. It reports to the management on the results of its work.
  15. The analyst collects incident statistics to analyze the effectiveness of the response process.

On our platform:

  1. There was a repeated virus infection of the host 192.168.10.5 with the same malware in a short period of time, events were sent to SIEM, and the corresponding rule worked.
  2. The operator looks at the incident card, into which information about this host, the status of the anti-virus protection tool and its logs, vulnerabilities on the host, related incidents and the persons responsible for this host have already been downloaded.
  3. , : , , Service Desk , - .
  4. Service Desk , .
  5. .
  6. .




It became a little faster. But the main thing is not this, but that it is possible to sort the tasks into “the first-line operator will handle” and “special needs”. That is, on average, the solution for each ticket has become significantly cheaper, and the system is more scalable.



In addition to many false positives, there were many duplicates that turned out to be convenient to detect by the system.



Cards do not look like a set of obscure data from a report, but like “Vasya did this and that on a host. This is bad. The host is responsible for Petya. That's exactly what happened. We need to go to Petya and say that Vasya’s computer from the work area cannot be used to show presentations at conferences. ”

Another important thing in all of this is that based on the collection of primary data, it has become possible to prioritize tickets. That is, the main potential threats pop up and require attention immediately, and not in a live queue.

Automation at the interface with IT tickets made it possible not only to collect all the information about the incident, but also to immediately put tickets to the IT department. If you need to change some settings on the router, then now a ticket from IT is automatically generated, for example. Surprisingly, cases began to emerge "forgot to change the account in the service, and he is trying to connect for a month." IT does not see such situations or ignore the infrastructure. And here the IS says - the service cannot log in. And they put a ticket.

Thanks to the typing of reaction cards, incidents began to be solved by standard methods. Previously, each was decided creatively: different people did different actions.

The result is such a good workflow as in modern CRM. The incident passes through a funnel. Another problem was solved at the last stage: earlier people sometimes closed the ticket simply because it was tired. That is, the result was poorly prescribed. And now you need to prove to the system that this is a false positive. That is, you can close it, as before, but it is clear that, who and under what conditions did it, and it is much easier to open the jambs. Not just “the user could not run the file”, but “brought the game on the USB flash drive, wanted to install it - they explained the rules of life once again”. And it’s already clear what happened.

Versatility


Now there are a couple of integrations in the production (one is very large with QRadar and an asset management system, another one is smaller). It is possible to connect with any SIEM using standard APIs, but, of course, integration requires time for connectors, file refinement, and writing down reaction rules for people. Nevertheless, it helps a lot to really respond to security incidents and do it relatively quickly and relatively cheaply. It is likely that in 10 years SIEM systems themselves will be able to do this, but so far our add-on has shown itself well.

If you want to feel it with us or discuss how it might look with you, here is my mail AAMatveev@technoserv.com.

All Articles