How to combine two platforms into one and not offend users. Yandex.Kew Developers Experience



Last year, TheQuestion service joined Yandex. At that time, there was already a similar service of questions and answers - Yandex.Znatoki. The connoisseurs had a large audience and many interesting questions, but there were not enough experts who could give high-quality answers to these questions. TheQuestion, on the contrary, had a strong community of experts, but it lacked interesting questions. The logical step was to combine the two services in order to take the best from each of them. But how to do this if each service has its own technological base, content and users?

Today I will talk about how our team solved this problem from a technological point of view. You will find out which options for combining we have considered and which in the end have chosen. I’ll tell you about the “swap API”, database migration, pooling profiles and backend testing. And yet - about the night of the move without the right to make a mistake. You will see that we did not have to be bored.

The task of merging two services into one is not new, but this does not make it easier. History knows many successful (and not so) examples of integration, but, unfortunately, there is no “silver bullet” and a clear instruction “do this and everything will work out”. Everything very much depends on the specifics of the services being combined and the desired result.

In our case, the goal was this: that all the content ever written on each of the sites was available on a unified service, and its authors could manage it.

So, how do you combine the two Q & A services, which seem to be so similar, but so much different in essence? Transferring content and users from one service to another is very similar to moving from an old apartment to a new one.

Only in our case, the user can live simultaneously in two apartments (Connoisseurs and TheQuestion), and you need to carefully transport it to the third. You need to move all the furniture, plants, a cat, and even wallpaper to a new apartment (that is, questions, answers, comments, likes), and then invite him to move.

How to do this? Several options immediately come to mind.

Option 1. Very bad.
Let's just take one of the services, transfer all the content to another (although even this is no longer easy) and close the original service.

This option is very bad from the point of view of the user of the first service. We just demolished his old house and forced him to move to a new one. Anyone will not like this attitude, and instead of moving, he can just go into the sunset. For us, the main value is the user community, so we did not plan to offend anyone. And boldly switched to other options.

Option 2. Bad
Let's not change any of the services, instead, launch a new, integrated one and periodically add content from the other two to it (for example, once a day).

In this case, we do not seem to make the user worse, but we also do not do better. His old apartment remains unchanged, but there is no point in moving to a new one. All neighbors also live in an old house, a flower just bought will be transported to a new apartment only after a day. Such a unified service has no chance of becoming a new home.

Option 3. Good, but complex
Let's not close any of the services, we will instantly duplicate content and profiles on the integrated service, and people will move over time.

All the user's neighbors (and even the cat) live simultaneously in the old and in the new house. A flower you just bought instantly appears in a new apartment. Exactly what is needed! This option is the most comfortable from the point of view of the user. Therefore, we chose it.

Start moving


What we eventually did can be described in several sentences. We completely repeated the entire TheQuestion API backend based on the Connoisseurs backend, thus obtaining a single backend that can work with two (and even three) sites at once. At the same time, TheQuestion frontend remained almost unchanged, which means that from the point of view of users, the site itself has practically not changed. This project has received the internal name "swap API". But first things first.

What we had at the entrance: two completely independent sites. Connoisseurs live in the inner clouds of Yandex. The Connoisseurs backend is written in Python. TheQuestion lives in the clouds of Microsoft Azure, TheQuestion backend is written in Go. Services have a completely different data storage scheme in the databases. In addition, TheQuestion has two mobile applications (for Android and iOS), which also needed to be supported. In general, the enemy does not want to unite such a zoo.



Stage 0. Drive into the Yandex cloud


Strictly speaking, this step is not necessary for the “plugin API”, but it significantly simplifies the next steps. At this stage, we completely abandoned external storage and facilities. TheQuestion started using Yandex DNS servers. The rentme services were moved to Yandex.Cloud. The database was transferred to Yandex Managed Databases. During the move, we were also able to find and fix several errors in TheQuestion, for example, unclosed connections to Redis in the 2015 code. As a bonus, we also received additional power for TheQuestion.

Stage 1. Data Migration


Regardless of which option you want to combine services, we would have to combine the data in any case. For a single database, they decided to take PostgreSQL - this DBMS has already been used in both Experts and TheQuestion. In order not to re-complicate the project, they did not begin to create a third base for the integrated service, but simply took the Znatokov database and expanded it so that it could accept all TheQuestion data. This was the first major technological challenge.

Each entry in each table from TheQuestion database had to be converted and put into the Expert Database. Then - correlate each column from one and the other base. Many fields had to be nontrivially converted from one format to another. So, a separate big subtask was the conversion of the text storage format (the actual format of the question or answer storage) from QML (TheQuestion) to Markdown (Experts).

We set up a regular (several times a day) process of transferring new data from one database to another, but at the same time made sure that data from TheQuestion was not displayed anywhere until the completion of the next stage. Because "several times a day" is far from being promised "instantly", and the data could be in an inconsistent state with the same data on TheQuestion, which would mislead users. So why did we start with data migration if the backend was not ready yet?

Firstly, in this way we stabilized the process. Secondly, they reduced the amount of new data that will need to be transferred in the future, and this is important, since all the imported content had to be driven through the markup for quality, inappropriate content, spam, fraud.



Stage 2. "Swapping API"


So, we solved the first of the problems - we learned to take content from TheQuestion database and even display it if you wish. Now it was necessary to make this content get into the integrated database instantly, and not several times a day.

To do this, it was necessary to rewrite the entire TheQuestion backend with all the necessary logic. The name of the project, “Swap API,” strictly speaking, does not fully reflect the essence. It would be more correct to call it "Swap backend." The fact is that, in addition to the direct implementation of all the “pens” necessary for the functioning of the TheQuestion front-end, other possibilities had to be realized. We faced several major tasks.

AuthorizationYandex has a centralized user authorization system - Yandex.Passport. And Connoisseurs, of course, used the Passport. To log in to it, you must have an account in Yandex. That was the problem. Not all TheQuestion users logged in to the site through Yandex (although there was such an opportunity). Many users did not have a Yandex login at all and went through social networks (VKontakte, Facebook ...). Naturally, we had to keep this functionality when moving. Therefore, we implemented “non-passport” authorization.

Site search.TheQuestion implemented a search on questions, answers, users and topics. For search, a third-party Sphinx solution was used. Obviously, if we are talking about a single service, then the search should be the same, that is, it cannot work on two systems at once. Thus, Sphinx was abandoned in favor of an internal search engine with support for the necessary functionality and indexing of all TheQuestion content.

Shipment of pages in Zen and Turbo . At the time of joining, TheQuestion already used Yandex technologies. Turbo pages were supported, interesting content fell into the Zen feed. All this also had to be supported in the "replacement API".

Notifications in the service and applications, mailing lists.Everything related to notifying users: subscriptions, newsletters with interesting content, push about likes and comments, much more. All this had to be carefully transferred and not forgotten.

Site Administration System . This paragraph refers to everything related to the internal management of the service: moderation, analytics, and so on.

Unified user rating system.This task was rather not technical, but logical. Formally, it is not necessary to develop a unified rating system for a “swap API”, but this system is still needed for the future integrated service. On both sites, users were rated for the quantity and quality of the created content. The details of the rating were not disclosed, but the more often and better you answer the questions, the higher your rating. Rating principles were the same on both services, but the formula itself and the factors were very different. It was necessary not only to correctly and honestly compare the users of Znatokov and TheQuestion among themselves, but also to learn to consider a single rating for those experts who wrote on two services at once.

And rewrite all the APIs.Like it or not, this task was the most important and difficult. Many processes on the services were similar, so we took them from the Connoisseurs and did not write from scratch. But there was also a lot of new things, for example, straight lines of users or draft answers. As a result, we rewrote more than 100 “pens” in the “swap API” and implemented more than 50 REST resources.

After we implemented all the functionality described above, it was possible to start moving. But before we did one trick.

It is clear that before switching and rolling out the “swap API” in production, it had to be tested very well. First of all, it was necessary to test it functionally, that is, directly check the performance of the entire site on the new API. Secondly, it is stressful. We wanted to be 100% sure that our design would not “lie” under load. Naturally, we regularly conducted “load firing”, which showed that we have a good supply of performance. But in matters of service performance, it is always better to play it safe. Any, even the best, synthetic load tests are somehow different from the production load. Therefore, we decided, before switching the API, to fill the production load on our stand.

To do this, at TheQuestion frontend, we implemented duplication of all GET requests (meaning data requests, not modifications) into two APIs at once: the “old API” TheQuestion, which at that time was the main one, and the secondary “swap API”. At the same time, the frontend did not wait for a minor API response and did not handle errors, but this way we were able to test the backend on real users.



Stage 3. And then we remembered about applications


No, of course, we remembered about them all this time, but faced one problem. Those who worked with mobile applications know that the hassle with them is much more than with the site. This is due primarily to the distribution of new versions.

Firstly, you need to work with external services of the App Store and Google Play and wait for the new versions to pass the verification (and sometimes the verification may take a considerable time). Secondly, even if your application has already passed the test and appeared in the store, this does not mean that users will make an update.

In the case of the frontend of the site, the developers themselves control when a new version is released, and they know for sure that after that all users will receive an updated version of the site. In the case of applications, there is no such guarantee. To obtain such a guarantee, they often use “forced update” of the application. Few people love this method, and, of course, always, if possible, you need to maintain backward compatibility between the application and the backend. Therefore, we took the path of making changes precisely on the backend side with minimal changes on the frontend in applications. But, as often happens, the plan is faced with harsh reality.

Some changes were much easier to do on the front-end side than on the back-end, therefore, in the process of developing a “plug-in API”, the front-end was slightly, but changed. In particular, the old TheQuestion database used numeric 64-bit IDs. The Connoisseurs database and, accordingly, the combined database and the new API for TheQuestion used string 128-bit IDs. In general, for a frontend written in Node.js, this difference is not significant. But for strongly typed applications, this turned out to be fatal. We lost backward compatibility, and older applications could not work with the "plugin API".

At some point, there was even a project called “plugin API for plugin API”, the essence of which was to write a small layer between the new backend and applications that would convert all the data into the old format. However, we quickly abandoned this idea. This layer would turn out to be a very rigid “crutch”, which in the future would definitely bring us many problems. For example, you can’t just take and translate 128-bit IDs into 64-bit ones. I would have to translate with loss of information and, consequently, possible collisions by ID, or maintain an intermediate table with the correspondence of old and new IDs (for all elements of the database). Both that, and another - not the best architectural solution.

In addition to ID, there were a number of other changes that were also much easier to support on the front-end and application side. As a result, we decided to implement changes in applications and still use the forced update. In a short time we developed new versions of applications compatible with the “swap API”, because there were not so many changes from the front end and they were not very serious. Sent to the App Store and Google Play, successfully moderated and began to wait.

Stage X. Let's go!


So all the code is written. The stand with the "replacement API" is tested and fired. New versions of the application have been tested in stores and are ready for publication. Now all this had to be rolled out into production.

Due to the fact that copying new data from the old database to the new one takes place asynchronously and takes some time, you cannot switch the backend (and the database under it) on a working site. This may result in loss or inconsistency of user data. Therefore, we chose a date, warned users and prepared a plate “Technical work in progress”.

And then the hour came, or rather night X. The roll-out plan looked like this:

  1. On the website TheQuestion we hang up a stub "Work is in progress".
  2. We transfer applications to Readonly mode. Users can read content from the old database, but can not create a new one.
  3. TheQuestion Readonly. : . , , .
  4. . , .
  5. .
  6. , , API.
  7. API . — , API .
  8. , , .
  9. « », API.
  10. .

Well, that doesn't look so scary. In reality, everything went quite smoothly. Of the surprises we encountered, it was perhaps only too long to publish the application in the App Store (the application was checked in advance, it was only about appearing in the store). In the end, it took several hours, which is why the whole operation was a little delayed.

In addition, in the process of switching, there was one key feature that complicated everything many times and increased responsibility. The fact is that the switching process could not be reversed.

Although the process of copying and converting data from the old TheQuestion database to the new, integrated one was set up and debugged for us, there was no reverse copying process (from the new database to the old one). This means that as soon as we open the site on the “swap API” and start up user traffic, all newly created questions, answers, comments and likes can no longer easily get into the old TheQuestion database. If something goes wrong after opening, for example, a single database cannot cope with the load, then it will not be possible to quickly roll back everything.

In fact, of course, I exaggerate. In any case, we would not lose user data. We had a plan B and a way to manually back up data from a new database to an old one. But it would still take some time, and the rollback would not have occurred quite painlessly for users.

Fortunately, Plan A worked, and nothing had to be rolled back.



Final stage


So, the backend was changed, the database was combined, the mobile applications were not forgotten. For users, nothing has changed, because the Yandex.Ku site, which was supposed to combine data from both sites, was not yet launched at that time. And for its launch we needed to solve one more problem.

At the very beginning, I wrote that we had to combine not only questions and answers from two services in one new, but also users. Users should have been able to not only see their content on Kew, but also manage it on Kew. Technically, combining data and transferring management rights is not difficult. It is much more difficult to make sure that the rights are transferred to the one to whom they should be transferred.

When moving from Znatokov to Kew, everything is simple: in both cases, the same Yandex account is used. But TheQuestion has its own account, which cannot be authorized on Kew. Fortunately, we thought about this beforehand. Long before the actions described above, we enabled TheQuestion users to link their Yandex profiles. And by the time of the physical consolidation of services, more than 90% of active users had done this. This allowed us to painlessly start the migration of content and users.

Total


When we moved, we wanted to save each user, so we consciously went to the most time-consuming and risky option of combining platforms. We created a unified technological base, learned how to instantly transport content and profiles. Instead of unexpected closure and forced relocation, they maintained the functionality of the old services, launched a new one and explained its advantages.



We launched Yandex.Kew last year. Now more than 80% of the active authors of TheQuestion and Znatokov voluntarily moved to a new home.

All Articles