"Leak" of a base of specialists Habr Careers

First, in telegram channels, and then on Habré , information appeared on leaked user data from the Habr Career site. We consider it necessary to give a more detailed commentary, as well as talk about how the privacy settings on the service are arranged.

TL; DR
,

A leak

Immediately a sedative dose: we did not find any traces of penetration into the service database. 

What then happened? In no case do not want to transfer the blame on the users themselves, but still the information that appeared in the “leak” was still available on the network. Just someone decided to collect it and from the side of the site it is very difficult to resist.

Therefore, we decided to inform our current and future users how privacy is arranged at Haber Career. So that, firstly, it was clear to everyone about the parsing of what kind of information in question. And secondly, so that, knowing this, everyone more consciously controls their privacy on the network.

How does privacy work at Habr Career?


There are two main privacy settings on the service: for the entire profile as a whole and for contact information. 

The user can choose to whom to show his profile:

  • Everyone, including guests and robots (the default value during registration)
  • Only authorized users
  • Friends and curators of vacancies with my response
  • Do not show anyone

The user's contacts are part of his profile, and there are additional privacy settings for them. The user can choose to whom to show their contact information:

  • Only authorized users
  • To friends and curators of vacancies with my response (the default value during registration)
  • Do not show anyone

To set the privacy of contacts, we deliberately removed the “Everyone, including guests and robots” setting so that contact information would not be indexed by search engines. After all, if the latter happens, then the user cannot quickly remove his contact information from the network. He hides them on the Career, but they still remain for quite some time in the search engine indexes.

Also, the user cannot put contact information more lenient privacy conditions than he did for his profile as a whole. For example, if the profile has the privacy of “Friends and Curators”, then the contacts can no longer be set to “Only authorized”.


The user sees what privacy settings of the profile and contacts they have in the left column on his main profile. The text with a description of each setting is clickable - the link sends the user to a page where you can change the corresponding setting.


Currently, we have the following statistics on user privacy:

Profile privacy :

  • 90% visible to everyone (default value during registration)
  • 6% are visible authorized
  • 1% are visible to friends and curators of vacancies with a response
  • 3% are hidden from everyone

Contact Privacy:

  • 25% are visible authorized
  • 75% are visible to friends and curators of vacancies with a response (the default value during registration)
  • <1% hidden from everyone

As you can see, 10% of users prefer after registration to choose tougher privacy settings for their profiles in general, and 25% prefer softer privacy settings for their contact details.

Thus, any user logged on to the site can view (and save) the profiles of almost all users and the contact information of a quarter of users. Which, in fact, happened.

What's in the archive


We did not get to the contents of the archive posted on one of the forums. But judging by the accompanying information provided, it contains just the information available in user profiles to other registered users of the service. We have a private API for working with our vacancies and responses on third-party sites, but the data from the posted archive is not from this API - the bot just walked through the pages, parsed them and put them in a file. Judging by the number of records, this database has been compiled over time (so as not to attract attention).

Specific example:


And here is this profile on the site:


What will we do


Initially, it was clear that protecting yourself from parsing is technically very difficult (and sometimes it’s just not practical). Articles on Habr only confirmed it :


Nevertheless, we still consulted with several people who do this not as a hobby, but on an industrial scale. Maximmakasin4ikfrom xmldatafeed.com said that now everything and everything is parsed, but together we came up with several nuances that will be finalized. Here are some of them:

  • Limit on the number of requests by an authorized user. On the same HH.ru, such a limit now amounts to 500 profiles per day, with us it will be less.
  • Advice from the category of “You Can't Win - Lead”: providing a paid API for legal database parsing. 
  • Additionally regularly inform those users who have indicated a lot of public contact information about themselves.



I really would not want our services to be exposed to real technical threats, so we are also planning to launch a bounty program soon. In the meantime, if you find a vulnerability, let us know in the feedback form - we will find a common language and fix everything. 

Thank you for the attention!

All Articles