🧓🏼 👨‍👧‍👦 🤳🏾 Making support cheaper, trying not to lose quality 🐥 🙅🏽 ✔️

Emergency mode (also referred to as IPKVM), which allows you to connect to VPS without RDP directly from the hypervisor level, saves 15–20 minutes per week.

The first and most important thing is not to enrage people. Throughout the world, support is divided into lines, and the employee should be the first to try typical solutions. If the task is knocked out of their limits - pass the second line. So, among VDS administrators quite often there are people who know how to think. Unlike many other supports. Well, at least significantly more often. And they structure the ticket well, immediately describing everything that is needed. If at the first line “the eye is blurred” and they accidentally ask to turn it on and off in response to this, this is a fiasco.

The task is very simple: to make the support of our VDS hosting adequate with a minimum of costs. Because we are fast food of the world of hosting providers: no special “licking”, low prices, normal quality. Previously, there was a story about the fact that with the advent of Instagram nyashki trying to automate account management and small business owners with remote accounting and the rest of people who are not too pumped up in technology, communication "like an admin with an admin" stopped rolling. I had to change the language of communication.

Now I’ll talk about the processes a little more - and about the inevitable jambs with them.

Do not enrage people number 1

Any support is conveyor production. An application arrives, the first-line employee immediately tries to recognize a typical situation that has happened a thousand times and will happen a thousand times. The chance is 90% that the application is typical, and you can answer it by pressing just a couple of buttons to substitute the template. You usually need to enter a few words in the template - and you're done. Or go into the control interface and click a couple of buttons there. In more complex cases (transfers from zone to zone, for example), you need to perform actions according to the algorithm.

What infuriates people most, regardless of other qualities of support, is a typical reaction to an atypical application. A ticket arrives where everything is described in detail, there is a lot of necessary data for three questions ahead, the client anticipates a dialogue ... And according to the first words, the support officer on autopilot picks up a chord to substitute the template “try to reboot, it should help”.

This is what directly opens the brain to people, and it is after such situations that the most negative reviews and angry comments remain. It is clear that we were so wrong, and from there we know the statistics. In general, we were mistaken in different ways, but such cases are always just wild. Including for ourselves. Of course, we would like this to not happen at all. But this is not very possible in practice: every few weeks, an employee tired of monotony is no, no, and even presses funny buttons.

Do not enrage people number 2

The second thing that opens the brain with equal success is when no one answers the ticket long enough. In Europe, this kind of support behavior is normal: three days before an incident is taken to work is more than normal. Even if you are very urgent and something is burning - no social networks, no phone, no messenger, just mail and wait your turn. In Russia this is much less common, but still some tickets are “forgotten”. At the very beginning, SLA was put on the first reaction for 15 minutes. And this is with 24/7 honest. It is clear that when VDS hosting becomes large, it appears. But dubious service providers do not. And we at the start were just dubious and only then became more or less large. Okay, more or less average.

The first line is the operators who were given scripts and taught to respond to typical situations. They quickly and quickly sort the problems and try in 15 minutes to either respond with a typical action, or report that the ticket is working, and transfer it to the second.

The second line is already hosting administrators, they can do almost everything by hand. There is also a support manager who can do everything and a little more. The third line is already developers, they get tickets like "fix this in the interface" or "such and such parameter is incorrectly taken into account".

Reduce the number of applications

For obvious reasons, if you want to provide support cheaply, then you do not need to increase the first line so that people can cope with scripts faster, but increase automation. So that instead of people with scripts there are real scripts. Therefore, one of the first things we did was to automate the processes of raising a virtual machine, scaling by resources (including disk up and down, but not the frequency of the processor) and other similar things. The more the user can from the interface, the easier it is to live the first line, and the less it can be. When a user handles something that is in his personal account, you need to do it and tell how it can be done independently.

If you don’t need support, then it does well.

The second feature that saves a lot of time is a long filling of the knowledge base. If the user has a problem that is not included in the list of supported actions (most often these are questions such as “how to install the Minecraft server” or “Where to configure VPS in Win Server”), then an article is written in the knowledge base. The same detailed article is written for all strange queries. For example, if a user asks for support to remove the built-in Windows Server firewall, then we send a read about what will happen if it is really disabled, and how to reset permissions only for the selected software. Because the problem is usually that something cannot connect due to the settings, and not with the firewall itself. But to explain this every time in the dialogue is very difficult. And somehow I don’t want to disable the firewall, because pretty soon we will lose either the virtual machine or the client.

If something about the application software in the knowledge base becomes very visited, then you can add the distribution package to the marketplace so that the service “raise the server with this already installed” appears. Actually, it became so with the Docker, and so it became with the Minecraft server. Again, one “do me well” button in the interface saves up to hundreds of tickets per year.

Emergency mode

After these actions, the most serious damage requiring manual work remains with the fact that the user for some reason has lost the means of remote access to the guest OS in the hypervisor. The most common case is a corny incorrect firewall setting, the second most frequent are some bugs that prevent Win from starting normally and force you to reboot into Safe Mode. And in safe mode, RDP is not available by default.

We made an emergency mode for this case. In fact, usually to access the VDS-machine you need to have some kind of client for remote work. Most often we are talking about console access, RDP, VNC or something like that. The disadvantage of these methods is that they do not work without the OS. But at the hypervisor level, we can get the image on the screen and transmit keystrokes there! True, this is not sickly, it loads the processor (due to the actual broadcast of the video), but it allows you to get the desired result.

Therefore, we gave access to emergency mode to all users, but it is limited by the duration of continuous use. Fortunately, as practice shows, this time is enough to reboot and fix something.

The result is even fewer support tickets. And where the administrator can fix the ticket himself, the support does not need to get your hands on and understand.

Remaining problems

Very often, users think that support has something to them. Alas, nothing can be done with this (well, or we didn’t come up with it). The two most common examples are resource limits and DDoS protection.

Each virtual machine has limits on the load on the disk, memory and allowable traffic. The possibility of setting limits is specified in the offer, the limits themselves are selected so that most of the users work quietly, without even knowing about them. But if you suddenly start to fiddle with the channel and disk very much, then the algorithms automatically warn the user. Since April last year, we removed the auto-lock. Instead - setting soft limits for a variable period.

It used to be like this: a warning, then, if the user has not heeded, it will automatically block. And at that moment people were offended: “Why are you, this is your system is buggy, there was nothing!” - and then you can either try to understand the application software, or offer to increase the tariff plan. We have no way to understand the work of application software, because this is beyond support. Although the first few cases were disassembled with users. I especially remember the one where the cheater on YouTube had a built-in trojan, and this trojan had a memory flow. As a result, they came to the conclusion that these are not Heisenbags, but problems with users, otherwise we would be overwhelmed with similar applications. But not a single person has yet admitted that he could exceed the tariffs himself.

A similar story is with DDoS: we write that you, dear user, are under attack. Connect protection, please. And the user: “Yes, it’s you who attack me yourself!” Of course, we put just one user with DDoS in order to breed 300 rubles. A profitable business. Yes, I know that many large hosting companies from the category of more expensive include this protection in the tariff, but we can’t do that: the fast food economy dictates other minimum prices.

No less often, those whose data we deleted are dissatisfied with the support.In the sense that it is legitimately deleted after the end of the paid term. If someone does not extend the VDS rental, then a few notifications will come with an explanation of what will happen next. At the end of payment, the virtual machine stops, but its image is saved. Another notice arrives, and then a couple more. The image is stored for seven additional days and only then deleted forever. So, there is a category of people who are very unhappy with this. Starting from "the administrator quit, notifications were sent to his mail, restore" and ending with accusations of fraud and threats of physical violence. The reason is all the same prices for all other users. If we keep the month, then we will need more storage. This will mean great prices for each individual customer. And the economy of fast food ... Well, you understand.And as a result, we receive feedback on the forums in the spirit of "took money, deleted data, scammers."

I note that we have a line of premium tariffs. There, of course, the situation is different, because we take into account the wishes of the client and flexibly adjust both the limit and the deletion if it is not paid (we take it to minus, just not to block it). There it is already economically feasible, because really everything happens, and maintaining a large regular customer is expensive.

Sometimes users are malicious. Several times in our system there were failures with the blocking of hundreds of virtual machines due to some clearly illegitimate actions of clients. Actually, precisely because of such situations, we needed our own network drivers to monitor network activity and see that the user is not executing an attack from his server. Monitoring such a plan is important so that the borders of neighboring virtual machines are not violated by violent guys.

There are those who banally spam, mine or otherwise violate the offer. Then he knocks in support and asks what went wrong and why the car is locked. If the process in the ticket on the screenshot is called "spam remover.exe", then something is probably going wrong. Somewhere else every two weeks we receive complaints from Sony or Lucasfilm (now Disney) that someone from our virtual machine from our range of IP addresses distributes a fake movie. For such a block at once and return of the money remaining on the account on the offer (I remind you: quantization is per second, that is, the balance will always be exact). And in order to return the money, under the law you need to show a passport: this is anti-money laundering. For some reason, instead of showing the passport, the pirates write that we squeezed the money from them, forgetting to clarify part of the circumstances.

Oh yes. We have the best request of the year: “Can I test a virtual machine for a few days at a rate of 30 rubles per month before purchase?”

Total

The first line sorts tickets and responds with typical actions. Most of the discontent is here. Correcting it all the same will not work, because the basis of the correction is in hosting automation, that is, in a huge backlog. Yes, we have more than many in the market, but still not enough. Therefore, the best that can be done is to establish monitoring of the first line. Support desk monitoring - first line KPI implementation. In real time, SLA delays are visible: who messes up, often why. Orders due to such alerts are never lost. Yes, they can respond to a ticket with an off-topic template, but we will already know this by feedback.

If the client asks for it, then the specialist of the second line can go to the server and do what the client needs there (the condition is confirmation by a letter in which he will provide the data for entering the server).

We do this very rarely and trust this work only to the best, because we want to have guarantees that user data will not be damaged. The best is the second line of support.

The first line has a knowledge base where you can send to look complicated.

A rich personal account plus a knowledge base - and now we were able to reduce the number of calls to 1–1.5 per year per client on average.

The second line usually processes complex applications requiring manual labor. What is characteristic: the more expensive the tariff plan, the fewer such applications per virtual machine. Usually because those who can afford an expensive tariff either have specialists in the state, or just half of the problems do not arise due to the fact that there is enough configuration for everything. I still remember the hero who did not install the oldest Windows Server on a configuration with 256 MB of RAM.

The second line has a set of distributions and a set of automation scripts. Both can be updated as needed.

The second line and personal VIP tariff managers are able to add notes to the client profile. If he is a Linux admin, we will write it down. This will be the first line hint: the user knows for sure that this will not be a shot in the leg, but controlled destruction.

The third line rules the strangest. For example, we had a bug that it was impossible to reach one of the functions of your personal account in Firefox. The user directly blackmailed: “If you don’t fix it within 12 hours, then I will write on all host reviews.” As it turned out, the problem was in the custom ad block. On the user side, oddly enough. Complex errors often come without details, and they can’t repeat it anymore. There are detectives with a screenshot: “Why are you fixing his month?” - “Yes, we’ve been looking for your bug all this time,” “Well, I came across it again today, but I couldn’t repeat it again” ...

In general, you never know where a screenshot of the dialogue with support will appear, and if a person knocks in support , then he has a problem. You can improve the attitude. At least try it.

Yes, we know that our support is not perfect, but, as I want to believe, it combines sufficient speed with sufficient quality. And it does not raise tariff prices for those who can do without it.