We analyze the resume of lawyers on hh.ru without api

Hh.ru is a good site that does not need additional submission. Searching for vacancies on it is convenient and prosaic. However, it is sometimes more interesting to see from the employer:

  • What does a resume look like on a targeted request?
  • as you can see your own resume in the issuance,
  • As a resume “sags” over time, it’s also necessary to gather a summary of fellow lawyers to build mini-statistics.

Despite the fact that hh has its own api and it is well documented, access to it is carefully guarded.

Access to the api is carried out as in the situation with many social api. networks - through preliminary registration of the application in the web-account of the account, in this case, the employer at hh.ru/employer :



To get there you need to register as an employer, confirm information about the organization’s affiliation (they will call you) and then follow the link: dev.hh.ru
However, at this stage, working with api is not yet fully available, since the application for registering the application on hh .ru can be considered up to 20 business days . Long.

Therefore, we will work without api, using the capabilities of python and the selenium framework.

In selenium we feed the url request on behalf of the employer, in which there will be the following positions:

  • Key words: lawyer;
  • Professional area: Any;
  • Region: Moscow,
  • Salary: Do not show resume without salary;
  • Education: It does not matter;
  • Citizenship: Any;
  • Work Permit: Any;
  • Age and photo: Only with photo;
  • Gender: It doesn't matter;
  • Sort: By date modified;
  • Output: For a month;
  • Show on page: 100 resumes.

Despite the fact that there will be many results, only 5,000 resumes will be available. Restriction for a free working employer.

In code, this is:

Import modules and enter the site
from selenium import webdriver
import time,csv
browser = webdriver.Firefox()
time.sleep (5) #   -  
browser.get ('https://hh.ru/employer')
time.sleep (5)


Authorization on the site
a = browser.find_element_by_css_selector('.bloko-icon_cancel')
a.click()
time.sleep (2)
a=browser.find_element_by_css_selector('div.supernova-navi-item:nth-child(6) > a:nth-child(1)')
a.click()
time.sleep (3)    
emailElem = browser.find_element_by_css_selector('.HH-AuthForm-Login')
emailElem.click()
time.sleep (1)
emailElem.send_keys('example@yandex.ru')    
time.sleep (1)
passElem = browser.find_element_by_css_selector('.HH-AuthForm-Password')
passElem.click()
time.sleep (1)
passElem.send_keys('password')
passElem.submit()
time.sleep (3)


example@yandex.ru and password - replace with the email of the employer, password - with the password.

Csv write block
def write_csv(data):
    with open('hh.csv','a',encoding='utf8') as f:
        writer=csv.writer(f)
        writer.writerow((data['name'],
                         data['age'],
                         data['salary'],                         
                         data['stag'],
                         #data['post_job_place'],
                         data['resume_link'],
                         data['photo_big']
                         #data['job_places'],
                         #data['education'],
                         #data['address'],
                         #data['update']
                         )) 


Parsing block
def resume_get():    
    # 
    a=browser.find_elements_by_class_name('resume-search-item__content-wrapper') # 100 
    #len(a)
    #resume-search-item__description-content -  
    for i in a:
        b=i.find_element_by_class_name('resume-search-item__header')
        name=b.find_element_by_class_name('resume-search-item__name').text # 
        age=b.find_element_by_class_name('resume-search-item__fullname').text # 52 
        salary=b.find_element_by_class_name('resume-search-item__compensation').text # 40000 .
        stag=i.find_elements_by_class_name('resume-search-item__description-content')[0].text # '7   8 '
        resume_link=i.find_element_by_class_name('resume-search-item__name').get_attribute('href') #  
        #post_job_place=i.find_elements_by_class_name('resume-search-item__description-content')[1].text #.  
        #job_places=b.find_elements_by_class_name('resume-search-item__description-content')[1:3] # 
        #education=i.find_elements_by_class_name('resume-search-item__description-content')[-1].text # 
        #photo_small=browser.find_element_by_class_name('resume-userpic').find_element_by_class_name('resume-userpic__photo').get_attribute('src') #   
        try:
            photo_big=i.find_element_by_class_name('bloko-modal-content').find_element_by_tag_name('img').get_attribute('src') #  -
        except:
            photo_big=''

     
        #update=i.find_element_by_class_name('output__addition').text #  
        data={    'name':name,
                  'age':age,
                  'salary':salary,
                  'stag':stag,                  
                  #'post_job_place':post_job_place,
                  'resume_link':resume_link,
                  'photo_big':photo_big
                  #'job_places':job_places,
                  #'education':education,
                  #'address':address,
                  #'update':update
                  }
        #print(data)
        write_csv(data)


50-page pagination iteration with resume
resume_get()
x=0

while x!=50:
    browser.get (url+'&page='+str(x+1))
    time.sleep(7)
    resume_get()
    x+=1


* commented fragments for those who want to add information to the selection.

After working out the program and loading the results into excel, we get the table:



How to find your resume? The easiest way is to filter the age of the candidates.

How to look where the summary appeared after the n-th period of time? Drive the program through the nth amount of time and find yourself again. The position on which the resume fell on the chart will be the position in the issue, since the program sequentially collects all resumes upon request.

Finally, a small graph-statistic on the resume of lawyers, “Muscovites” (2000 resumes).





ps. Resume without updating the site sags 2,000 positions per day at the request of "Corporate + lawyer."

Program - download
Excel spreadsheet -download

All Articles