Program for searching like-minded VKontakte [Open source]

VK provides very good tools for advertising targeting, which allow you to find people of the right gender, age, social and marital status, subscribed to specific groups, etc. This is only the tip of the iceberg, if you delve into the big data available on social networks, you can find out about the person almost more than he knows about himself.

At the same time, there is practically no mechanism for finding new friends, which is rather ironic in the context of a social network. On the other hand, it is understandable, if something does not generate income, it means that it will most likely not develop. For dating, VK recently launched the application, but, as I understand it, this is actually a Tinder clone, and it doesn’t pull up any data at all from profiles, even a banal attitude to smoking or alcohol - from the social network there is only authorization.

It became interesting to us how realistically to fix this situation using the VK API, and here is what came of it:

Like-minded ribbon on your home screen

As it was?


So, for starters, it's worth painting the initial state. The only adequate strategy that comes to mind is to find a community that is as close as possible to you in terms of worldview / aesthetic / hobby indicators, and open a search for subscribers. Introduce some filters, for example age, and then just look at all in a row.

But this option has the following disadvantages:

  • Many abandoned, closed, fake profiles
  • Search by fields of life position is broken
  • There is no multi-choice in the search, that is, a negative and sharply negative attitude to alcohol, for example, - you can’t choose
  • You have to constantly switch between browser tabs
  • It’s necessary to manually evaluate at what place the user in the list has the public in question, is he really interested
  • It is necessary to manually evaluate what the user is still subscribed to, whether there are other publics of interest to us, or something unacceptable.
  • You cannot mark viewed profiles if you haven’t viewed everyone

Fortunately, almost all of this is solvable, and given 100 million active users per month, even introverts with very specific tastes have a chance to find like-minded people.



How is it?


We decided to try to make a project based on the analysis of subscriptions, with a convenient interface and automation tools. In fully automatic mode, the algorithm is something like this:

  1. User logs in via VK
  2. Download the list of his subscriptions (less than 1M)
  3. We evaluate each of them in accordance with the position in the list.
  4. Download N subscribers from each of his groups (N depends on how much time he indicated)
  5. We find in the database users subscribed to several groups and calculate their rating
  6. For the people with the highest rating, download their subscription lists to make sure that the groups found are not in the 2048th place to build their top interests

Next, the program goes into manual mode, where you can additionally specify unwanted groups, change grades, add groups that are not subscribed to, but people of which are also interesting - everything will be recalculated, the top will be rebuilt. There is also a full mode that can be used to analyze individual small groups. In it, the database is formed exclusively by obtaining data from each user, it is not required to download all the groups in the list.




About the source code


We decided to open the source so that everyone (who knows C ++, huh) could experiment even with those parameters that are not made in the settings. Well, no one was worried that his page would be taken into slavery to the bot drivers, and the data would be sold on the darknet.

Some developers open the code to brag - see how I can. This is not the case. The project developed without a clear technical specification, with constantly changing requirements, and there can be no good architecture here as a class - even the most flexible one, most often, does not bend at all where it was supposed to. After the project takes its final form, and the requirements become clear, usually we have to do a very large-scale refactoring, but in this case we decided to postpone it.

Firstly, the program has become especially relevant in the light of recent events, and delaying the release for another month would be foolish. Suddenly, someone in quarantine is especially lonely, you can find friends on the Internet.

Secondly, it is unclear whether anyone is interested in this program at all, and if not, then there is no point in wasting time on improving the code, because support is not expected. So for technical debt and cutting corners do not kick much, we are in the know.

It was also necessary to postpone other improvements, for example, speeding up work through VK procedures, or taking into account the group’s position in the user list in order to reduce the contribution of groups that are low. An earlier release and greater stability were more important.

System requirements


Monitor from 1366 x 768, FullHD recommended. SSD will not be superfluous either.

The bases were not tested for more than 5 million, after 10 strong brakes will surely begin. You can switch to a more powerful DBMS rather quickly (Qt abstractions allow), but so far it seems impractical because multimillion-dollar publics have little to say about subscribers, so many people cannot be niche.

Sources here . Binaries for Windows and Linux .

PS I have thoughts on how VK could improve the situation on its side, but this is a topic for a separate article.

PPS Many IT publishers (including Habr) have lists of subscribers closed, and you won’t be able to take them into account.

All Articles