Iron or optimization? Badoo, Avito and Mamba - about PHP performance

The PHP performance issue for Badoo is one of the most important. The quality of the PHP backend directly depends on the amount of resources we spend on development and operation, the speed of the service and the impression it makes on users.

Therefore, the topic of the third meeting of the community of PHP developers in our office we made backend performance and invited colleagues from Avito and Mamba to discuss.



Read under the cutscene transcript of the discussion, in which I was lucky to be a moderator: how the infrastructure of the three companies is arranged, how we measure productivity and what metrics we focus on, what tools we use, how we make a choice between hardware and optimization.

And on February 15, come to the next Badoo PHP Meetup : discuss Legacy.



We have deciphered only the part of the discussion, which seemed to us the most interesting. The full version is available on video.

Experts:

  • Semyon Kataev, Head of Development Unit at Core Services Avito
  • Pavel Murzakov pmurzakov, PHP Timlid in Badoo
  • Mikhail Buylov mipxtx, IT Director of Mamba


Tell a story about optimization from your practice: grand success or grand failure is something that is interesting to share.

Mikhail Buylov, Mamba


I have a parable.

About once every six months we look at the metrics and look for what slows down, does not work well, what needs to be optimized. Once we noticed our symfony dependency container, which grew to 52,000 lines. We decided that he, the scoundrel, was to blame for everything: 20 ms of overhead for each request. We sawed it. We reduced it. We somehow tried to separate it, but nothing helped.

And then it turned out that we have anti-spam that needs to go to 20 databases in order to fulfill all the necessary requests.

The solutions that come first are not always the right ones. Look better at the traces, logs and benchmarks of your requests, and do not break directly. Here is a story.

Pavel Murzakov, Badoo


We have a fairly large ecosystem in PHP, so we periodically do optimization. We are growing, we are reaching some level in CPU, we understand that we need to either buy hardware or optimize. Weigh the arguments for and against each option and decide. Most often - in favor of optimization, because iron needs a lot.

At one of these points, we identified a whole group that was engaged in looking for various non-optimal things in PHP scripts and optimizing it. This happened literally “drop by drop”: here they found a percentage, there they found a percentage - several people found a percentage within a month. At some point, our sishnik was nearbyeinstein_man. He decided to see what he could do: he went in the evening, launched Perf, found a couple of problems in PHP extensions - and accelerated everything in a couple of evenings by 13%!

Semyon Kataev, Avito


I have two stories. One about the file, the other about the super developer.

About the Fail. We have many microservices, including PHP. Everyone works at Kubernetes. I worked on one of these microservices. There was a significant CPU utilization: they spent a week optimizing and finding problems. It turned out that one of the developers added Xdebug to the base images for calculating code coverage tests in his (other) service, after which all microservices in production worked with Xdebug turned on for a whole quarter! After a quarter, we discovered this, fixed basic images - and began to catch unexpected gifts: I re-rolled my service, and it began to work faster. At each deployment, our service image is rebuilt, and now without Xdebug.

History of success. We have many microservices, and they became more and more. In this situation, the number of RPC calls becomes a problem. For example, on an announcement card - and this is one of the most frequent pages in Avito - about 30 microservices are involved in rendering a page. Moreover, all this was not done very explicitly: it seems like you are calling some kind of abstraction, and under it five RPC calls to other services are sequentially performed.

Over the years, the announcement card has been greatly degraded. One strong developer fought for a quarter, optimized, and brought all RPC calls out. When he was able to do this, he parallelized them through Guzzle multi request - and instead of 30 consecutive synchronous requests, he received the same 30 requests, but parallel ones, which greatly accelerated the work. After this refactoring, the response time of the card is equal to the maximum response time of any of the services. But he needed a whole quarter to optimize / rewrite the display code of the announcement card.

Tell us, what is your PHP cluster size, how is it configured - at least PHP-FPM, or maybe somewhere Apache has gotten into trouble?

Mikhail Buylov, Mamba


We have about 15,000 RPS. A cluster of 80 FPM servers purchased several years ago. On each cluster, FPM (up to 50 childs maximum) is launched into statics. It loaded about ten at peak in prime time. The average response time is 100 ms, and we try to hold it (when it exceeds 100 ms, we begin the search for braking pieces).

We have our own performance monitoring system. We scattered a lot of counters in the code, about 120 per request. We monitor a lot of events that occur inside the code in PHP.

Pavel Murzakov, Badoo


Everything is standard with us: nginx, PHP-FPM. Approximately 600 servers with FPM. If to speak about PHP at all, then, probably, there are about 300 more servers for various purposes, such as script, back-office and others.

Of the configuration features, there are two. Firstly, we have a BMA proxy - this is a proxy for mobile. That is, before a request arrives in nginx, it gets to a special proxy that holds a persistent connection and sends requests to nginx. The second feature - sometimes you need to turn off the CLI opcache (we have it turned on on scripting machines). Once we did not turn it off and lost 30% of the CPU on this. After they realized their mistake, they were surprised how much you can save with one setting.

We have PHP patches, but they are almost not related to performance.

There is a point with the competitive APCu lock - when you need to write a lot to the same key. The internal architecture of APCu is built in such a way that there is a global key lock, and during intensive recording, brakes begin. Therefore, we have a custom extension there that solves this problem. This only partly relates to performance, as it affects response time, but not CPU consumption.

Semyon Kataev, Avito


We have about 2 million requests per minute (~ 33 kRPS). A monolithic application written in PHP has been written for over 11 years. It is in a phase of rapid growth. When the company started, there were 65 LXCs on 65 physical servers. On each container with the application, PHP-FPM, nginx and auxiliary software for metrics and logging are running. Nothing special.

Over the years, we have never added iron for a monolith. We are increasing attendance, the number of ads, the number of user transactions, and we are constantly optimizing, improving the code, optimizing software. CPU and memory consumption has been falling over the past years: 65 containers for a monolith are now enough for us.

How do you measure performance? What tool do you measure the response time of the client?

Mikhail Buylov, Mamba


We have a log collection system. It logs two indicators - the time from the start of FPM to the shutdown function and until the end of the script. The second metric is needed to see what happens after the shutdown function.

We measure JS. This, in fact, is a so-so metric; network channels are very often violated. As a result, loading somewhere in the Russian outback begins to blunt. Therefore, we look like this: "Oh, jumped - it means that somewhere something fell off." Plus third-party advertising very much distorts the metric. And, most importantly, spammers come, and this is generally some kind of randomness.

Semyon Kataev, Avito


By the way, we used to use Pinba from Badoo very actively . I like her now. Most of the metrics were collected by it, but then switched to the StatsD protocol. Now we are taking measurements from different points: from the front, from the servers in front of the application, from nginx and from the PHP application itself. We have a dedicated performance team. She started with the performance of the front, but then switched to backing. From the front, it collects not only JS, CSS and other statics, but at the same time also the server response time. First of all, we focus on the response time of the application.

Pavel Murzakov, Badoo


Everything is similar to what the guys told us. Using the classic Pinba for PHP, we measure the running time of a PHP script in terms of PHP. But we also have, for example, Pinba for nginx, which measures the response time from the point of view of nginx. We also collect metrics on the client.

What are we looking at? On the one hand, the response time. It is not related to resource planning. If it is bad, it needs to be improved, because it is, in fact, the quality of service. Another thing is that you need to somehow plan the iron. Our ITOps and monitoring teams monitor all the hardware. There are some bars on the network, on the disk. There are some values ​​after which the alert occurs - and we are doing something. As practice has shown, we usually optimize for the CPU: we run into it.

Semyon Kataev, Avito


At us the PHP application measures itself and in register_shutdown_function () throws out metrics. Each LXC has a StatsD server, which collects these metrics and sends them through the collectors to the Graphite cluster (including ClickHouse), where the data is stored. This is a self-diagnosis.

Also on each container is nginx, i.e. nginx + PHP-FPM. With nginx, we collect external metrics related to the running time of a PHP application. They are also facing separate servers (we call them avi-http) nginx, which performs basic routing, which also collects higher-level metrics, such as response time, the number of 500 response codes, and others.

What tools do you have for a performance track? What do you use most often?

Mikhail Buylov, Mamba


We wrote our own tool. When Pinba just came out - in 2012, a very long time ago - it was a module for MySQL that received something like this over UDP. It was difficult to take out the graphics; it was not very optimized for performance. And we didn’t come up with anything better than writing our own thing called Better Than Pinba. It is just a counter server that accepts them from a PHP client.

We scattered a lot of timers in the code: every time we want to measure something, we set the start and stop of the timer in the code. The module itself will calculate the counter runtime, aggregate the accumulated counters into a packet, and send them to the daemon. The interface itself will extract everything you need and build connected graphs on the desired counter.

One of Pinba's problems was the lack of its own interface - it was necessary to transfer data to RRD (then there was such darkness). Therefore, we wrote our own interface. Every time we see what jumped, we can install the script. In the script all aggregated counters are sent to us, which are sent to us. We can see where which counter has grown, or the response time at the counter has increased, or the number of counters has increased.

It can be seen when performance drops. We are starting to dig this way. Before PHP 7, we used XHProf, then it stopped building with us. Therefore, we switched to Xdebug. Xdebug we poke only when the problem is visible.

Pavel Murzakov, Badoo


This is a common belief that XHProf is not going to be built in PHP 7. This is true, but only in part. If you take XHProf from the wizard, it really will not. But if on GitHub you switch to a branch called Experimental (or something like that), then everything works fine for PHP 7, production ready. Checked.

Mikhail Buylov, Mamba


No, I switched. It didn’t work for me.

Pavel Murzakov, Badoo


I want to add about Pinba. You have become prophets to some extent. We, too, at some point lost productivity. We made Pinba2 , which is very fast. It can be used as a replacement for Pinba.

Semyon Kataev, Avito


Everything is modest with us. We just took the direction of the performance into the work: we collect metrics, such as response time. We use StatsD. We do not use any profilers on a regular basis yet, but I know for sure that some teams use them in their microservices written in PHP. In my opinion, even someone relays New Relic. But in the context of the main monolithic application, so far we are only approaching this.

Pavel Murzakov, Badoo


Iron history is monitored in Grafana, Zabbix. As for the PHP part, we have Pinba, we have a bunch of timers; We build convenient graphics on them. We use XHProf, on production we drive it for a part of requests. We always have fresh XHProf profiles available. We have liveprof: this is our tool, you can read about it, including in my article . It all happens automatically, you just have to watch. We use phpspy. It is not always started with us: when someone wants to see something, he enters the car, takes off his profile. In principle, as is the case with Perf.

Semyon Kataev, Avito


XHProf has the same story. We once used it a long time ago: it was a personal initiative of a couple of developers, and, in fact, did not take off. He stopped collecting. We collect a bunch of metrics from calls of routers, controllers, various models. About 60–70% of the internal network of the data center is occupied by UDP packets with metrics. At the moment, this is enough for us. Now we will look for new places for optimization.

Since we have reached the point of iron: is someone systematically engaged in capacity planning in your company? How is this process built?

Semyon Kataev, Avito


A monolithic application runs at least 65 LXC for at least five years. We optimize the code, improve it: it has enough resources. Our main capacity planning goes to Kubernetes, where about 400 more or less living microservices are written in PHP / Go. We are slowly cutting pieces from the monolith, but it is still growing. We cannot stop him.

In general, PHP is a cool language. It quickly implements business logic.

Pavel Murzakov, Badoo


First of all, ITOps and monitoring teams make sure that there are enough resources. If we begin to approach the threshold value, colleagues notice this. They are probably primarily responsible for global capacity planning. The PHP part has the main CPU resource, so we follow it ourselves.

We set ourselves the bar: we should not "eat" more than 60% of the cluster. 60%, and not 95%, because we have hypertreading, which additionally squeezes out more from the processor than you can squeeze out without it. For this, we pay by the fact that after 50% of our CPU consumption we can grow in an unpredictable way, because hyper -reading kernels are not entirely honest.

Mikhail Buylov, Mamba


We make a deployment and see that something has failed with us - such capacity planning. By eye! We have a certain margin of productivity that allows us to do this. We try to stick to it.

In addition, we do post factum optimization. When we see that something has fallen off, we roll back if everything is completely bad. But this practically does not happen.

Or we just see that "here is not optimal, now we will quickly fix everything and everything will work as it should."

We don’t particularly bother: it is very difficult, and the exhaust will not be very large.

You are talking about microservices. How do they interact? Is it REST or are you using binary protocols?

Semyon Kataev, Avito


Kafka is used to send events between services, JSON-RPC is used to ensure interaction between services, but not full, but its simplified version, which we cannot get rid of. There are faster implementations: the same protobuf, gRPC. This is in our plans, but certainly not a priority. With over 400 microservices, it’s hard to port them all to the new protocol. There are tons of other places to optimize. We are definitely not up to it now.

Pavel Murzakov, Badoo


We as such have no microservices. There are services, there is also Kafka, its own protocol on top of Google protobuf. Probably, we would use gRPC, because it is cool, it has support for all languages, the ability to very easily bind different pieces. But when we needed it, gRPC was not there yet. But there was a protobuf, and we took it. On top of it, different things were added so that it was not just serialization, but a complete protocol.

Mikhail Buylov, Mamba


We also do not have microservices. There are services mainly written in C. We use JSON-RPC because it is convenient. You just opened the socket when you debug your code, and quickly wrote what you wanted. Something has returned to you. Protobuf is more difficult because you need to use some additional tools. There is a small overhead, but we believe that you need to pay for convenience, and this is not a big price.

You have huge databases. When you need to change the circuit in one of them, how do you do it? Some kind of migrations? If these migrations take several days, how does this affect performance?

Mikhail Buylov, Mamba


We have large tables, monolithic. There is a shard. The shard alters quite quickly, because there are many parallel alters at once. A large table with profiles altered about three hours. We use Perkonovskie tools that do not have it read and write. In addition, we deploy to alter so that the code supports both states. After alter, we also deploy: deploying is faster than applying any schemes.

Pavel Murzakov, Badoo


We have the largest storage (we call it "spots") - this is a huge sharded database. If we take the “User” table, then she has a lot of shards. I won’t tell you exactly how many tables will be on one server: the idea is that there are many small ones. When we change the circuit, in fact, we just do an alter. On every small table, it happens quickly. There are other repositories - there are already other approaches. If there is one huge base, there are Perconian tools.

In general, we use different things according to needs. The most frequent change is a change in this huge sharded spot base, in which we already have a process built. It all works very simply.

Semyon Kataev, Avito


The same monolithic application that serves most of the traffic is deployed five to six times a day. Almost every two hours.

In terms of working with the database, this is a separate issue. There are migrations, they roll automatically. This is a DBA review. There is an option to skip migration and manually roll. Migration will automatically roll into staging when testing code. But on production, if there is some kind of dubious migration that utilizes a bunch of resources, then the DBA will start it manually.

The code should be such that it works with the old and new database structures. We often do multiple trips for deploying features. For two or three roll-outs, it is possible to obtain the desired state. Likewise, there are huge databases, sharded databases. If we count for all microservices, then there are definitely 100-150 deployments per day.

I would like to know what is the standard backend response time for you that you are following? When do you understand what you need to optimize further, or what is the time to finish? Is there a “hospital average” value?

Pavel Murzakov, Badoo


No. Depends on endpoint. We look at everything separately. Trying to understand how critical this is. Some endpoints are generally requested in the background. This does not affect the user in any way, even if the response time is 20 s. This will happen in the background, there is no difference. The main thing is that some important things be done quickly. Perhaps 200 ms is still ok, but a small increase already matters.

Mikhail Buylov, Mamba


We are switching from HTML rendering to API requests. Here, in fact, the API responds much faster than the large, heavy HTML response. Therefore, it is difficult to distinguish, for example, a value of 100 ms. We focused on 200 ms. Then happened PHP 7 - and we began to focus on 100 ms. This is the "average for the hospital." This is a very vague metric, which suggests that it is time to at least look there. And so we rather focus on the deployment and jumped response time after it.

Semyon Kataev, Avito


We did a sketch of one of the performance teams. Colleagues measured how much more a company earns by speeding up page loading under various scenarios. We calculated how much more buyers, transactions, calls, transitions, and so on are going on. According to these data, it can be understood that at some point, acceleration ceases to make sense. For example, if the response time of one of the pages from 90 ms was accelerated to 70 ms, this gave + 2% of buyers. And if you accelerate from 70 ms to 60 ms, then there is already plus 0.1% of buyers, which is generally included in the error. Just like with the guys, everything depends a lot on the page we are working with. In general, Avito 75th percentile - about 75 ms. In my opinion, this is slow. We are now looking for places to optimize. Before the transition to microservices, everything happened much faster for us, and we are trying to optimize performance.



And the eternal question: iron or optimization? How to understand whether it is worth buying hardware or is it better to invest in optimization? Where is the border?

Semyon Kataev, Avito


My opinion: a real programmer is optimization. It seems to me that in large companies such as Badoo, mine, Yandex, there are many developers of different levels. There are both junior developers / interns and leading developers. It seems to me that the number of places for optimization / revision is always added there. Iron is the last step. For a monolith on 65 LXC, we have not added iron for a very long time. CPU utilization - 20%. We are already thinking of transferring them to the Kubernetes cluster.

Pavel Murzakov, Badoo


I really like the position of Semyon. But I have a completely opposite point of view. First of all, I would look at the iron. Is it possible to drop iron and will it be cheaper? If so, then it is easier to solve the problem with iron. The developer can do something else, useful at this time. Developer time costs money. Iron also costs money, so you need to compare.

Which of these is more important is unclear. If both of them cost the same, then the hardware wins, because the developer will be able to do something at this time. Specifically, in terms of the PHP backend, we cannot do this. Optimization costs us much cheaper than buying iron.

About to stop. In terms of planning, we have some kind of bar. If we reduce CPU consumption below it, then we stop. On the other hand, there is also the quality of service. If we see that the response time does not suit us somewhere, then we need to optimize.

Mikhail Buylov, Mamba


It seems to me that everything depends on the size of the team and the project. The smaller the project, the easier it is to buy hardware, because the developer's salary is constant, and the code that works, the users who work with it, are very different. If there are few users, then it’s easier to buy hardware and entrust the developer with work to develop the project. If there are a lot of users, then one developer can compensate for most of the cost of servers.

Semyon Kataev, Avito


Everything really depends on the scale. If you have a thousand servers and optimization leads to the fact that you do not need another thousand servers, then it is clearly better to optimize. If you have one server, then you can safely buy two or three more servers and score on optimization. If the command is small, then start the servers are purchased. If the team is large and you have two or three data centers, then buying six data centers is already not so cheap, not so fast.

Since we have a PHP mitap, this phrase should sound: why do we need PHP, since we have such problems all the time? Let's rewrite Go, C, C #, Rust, Node.js!
Do you think the rewriting approach is generally justified? Are there any problems for the solution of which it is worth doing this and investing at this time?

Semyon Kataev, Avito


In general, PHP is a very good language. It really allows you to solve business problems. He is fast enough. All performance problems are problems of errors, bugs, legacy code (some under-cut code, non-optimal things that would have shot in another language in the same way). By porting them raw to Golang, Java, Python, you get the same performance. The whole problem is that there is a lot of legacy.

Introducing a new language, in my opinion, makes sense in order to expand the stack and employment opportunities. Now it’s hard enough to find good PHP developers. If we introduce Golang into the techradar, then we can hire gophers. There are few PHP shniks and few gophers on the market, and together there are already a lot of them. For example, we had one experiment. We took C # developers who are ready to learn new languages ​​- we simply expanded the hiring stack. If we tell people that we will teach them how to write in PHP, they say that it’s better not to. And if we offer them to learn to write in PHP, Go and still promise the opportunity to write in Python, then people are more willing to respond. For me, this is an extension of employment opportunities. In other languages, there are some things that are really missing in PHP. But in general, PHP is super sufficient for solving business problems and implementing large projects.

Pavel Murzakov, Badoo


I probably completely agree with Semyon. Rewriting just does not make sense. PHP is a fairly productive language. If you compare it with other scripted non-compiled languages, then it will probably be almost the fastest. Rewrite into some languages ​​like Go and others? These are other languages, they have other problems. It’s still more difficult to write on them: not so fast and there are a lot of nuances.

Nevertheless, there are things that are either difficult or inconvenient to write in PHP. Some multi-process, multi-threaded things are better to write in another language. An example of a task where the non-use of PHP is justified is, in principle, any storage. If this is a service that stores a lot in memory, then PHP is probably not the best language, because it has a very large memory overhead due to dynamic typing. It turns out that you save a 4-byte int, and 50 bytes are consumed. Of course, I exaggerate, but still this overhead is very large. If you have some kind of storage, then it is better to write it in another compiled language with static typing. Just like some things that require multi-threaded execution.

Mikhail Buylov, Mamba


Why is PHP considered not very fast? Because it is dynamic. Go translation is a solution to the problem of translating code from dynamic typing to static. Due to this, everything is happening faster. Specifically for Go, in my plan there are tasks of a specific data stream. For example, if there is some kind of data stream that needs to be converted into more convenient formats, this is an ideal thing. Raised by a daemon, it receives one data stream at the input, it issues another. Little memory eats. PHP eats a lot of memory in this case: you need to carefully ensure that it is cleaned.

Transfer to Go is a transfer to microservices, because you won’t cut out the whole code. You won’t take and rewrite it in its entirety. Translation to microservices is a deeper task than translation from one language to another. It is necessary to solve it first, and then think about what language to write. The hardest part is learning how to microservices.

Semyon Kataev, Avito


It is not necessary to use Go in microservices. Many services are written in PHP and have an acceptable response time. The team that supports them decided for themselves that they needed to write business logic very quickly and introduce features. They actually sometimes do faster than gophers.

We have a tendency to hire gophers, translate to Go. But we still have most of the code written in PHP, and so it will be for at least a few more years. There is no real race, we did not decide that Go is better or worse. Six releases are rolled out into the monolith daily. Our chat “Avito Deploy” has a list of tasks that are deployed. At least 20 tasks are rolled out per day in each release: at least five or six releases per day, approximately 80 tasks that people have done on these tasks. All this is done in PHP.

? -, ?

,


It is very difficult. There is a psychological moment: I launched a new feature - you're done. Launched a gigantic new feature - you're super-young! When a manager or developer says that he removed a feature (decommissioned), he does not receive such recognition. His work is not visible. For example, a team may receive a bonus for the successful implementation of new features, but I have not seen any awards for dead or experimental functionality. And the result from this can be really colossal.

A bunch of legacy features that no one is already using really slows down development. There are hundreds of cases when a new person enters a company and refactores a dead code that is never called because he does not know that it is a dead code.

We are trying to somehow negotiate with managers, find out by golog what kind of feature it was, with analysts we consider how much money it brings, who needs it, and only after that we try to cut it. This is a complicated process.

Mikhail Buylov, Mamba


We cut dead code when we get to it. And it is far from always possible to cut it quickly. First you write one code, then another, on top a third, and then underneath it all turns out that this feature is not needed. She pulls a wild number of dependencies, which also need to be fixed. This is not always possible, and managers need to report it.

The manager sets the task, you evaluate it. You say: “You know, man, I’ll do this for six months. Are you ready to tolerate half a year? ” He says: “No, let’s think what needs to be cut, what can be left. What is fundamentally necessary and what is needed to play around. ” That's how it goes.

Pavel Murzakov, Badoo


When a developer receives a feature, he evaluates how difficult it is in terms of development, how difficult it is in terms of performance. If he sees that either one or the other, he clarifies with the product manager whether this is really necessary, whether we can remove something somewhere. Sometimes it happens that changes are not critical, and managers quickly make concessions when they find out that this thing is complicated or is eating resources. It just happens: you say “Let's remove it” - and that’s it.

It happens that a feature leaves, and after that we notice that it will not perform as well as we would like. And then, again, you can talk with the manager. Often it all ends successfully.

There is no built-in process to remove features. This is done sporadically. We see that there is some feature, we come up and offer to turn it off. We turn it off, watch what it gives, and cut it out. Another thing is dead code. We have a special extension, even a whole infrastructure, for dead code detection. We see this dead code. We are trying to cut it off slowly. Another question is if it is really dead and the request never enters it, then it does not affect performance in any way. It affects supportability. People constantly read it, although they might not read it. There is no particular connection with performance.

15 Badoo PHP Meetup. , : SuperJob, ManyChat, , Badoo FunCorp .

, , YouTube-. , - .

, .

Source: https://habr.com/ru/post/undefined/


All Articles