🧕 📲 💃 我们在没有API的情况下分析hh.ru上的律师简历 ☃️ 🐁 🦔

Hh.ru是一个不错的网站，不需要其他提交。在其上搜索职位空缺既方便又平淡。但是，有时从雇主那里看会更有趣：

有针对性的要求的简历看起来像什么？
正如您在发行中看到的简历一样，
随着时间的推移，简历“下垂”，也有必要收集律师的摘要以建立小型统计。

尽管hh有其自己的api并且有充分的文档证明，但仍谨慎保护对其的访问。

与许多社交api一样，都可以访问api。网络-通过在帐户的网络帐户中预先注册应用程序（在这种情况下，是hh.ru/employer的雇主）：

要到达那里，您需要注册为雇主，确认有关组织隶属关系的信息（他们会打电话给您），然后单击链接：dev.hh.ru
但是，在此阶段，与api的合作尚不完全可用，因为在hh注册应用程序的应用程序.ru最多可考虑20个工作日。长。

因此，我们将使用python和selenium框架的功能而无需使用api。

在硒中，我们代表雇主提供url请求，其中将出现以下职位：

关键词：律师；
专业领域：任何；
地区：莫斯科
工资：不显示没有工资的简历；
教育：没关系；
公民身份：任何；
工作许可证：任何；
年龄和照片：仅带照片；
性别：没关系；
排序：按修改日期排序；
输出：一个月；
在页面上显示：100个简历。

尽管会有很多结果，但只有5,000份简历可供选择。对自由工作的雇主的限制。

在代码中，这是：

导入模块并进入站点

from selenium import webdriver
import time,csv
browser = webdriver.Firefox()
time.sleep (5) #   -  
browser.get ('https://hh.ru/employer')
time.sleep (5)

网站授权

a = browser.find_element_by_css_selector('.bloko-icon_cancel')
a.click()
time.sleep (2)
a=browser.find_element_by_css_selector('div.supernova-navi-item:nth-child(6) > a:nth-child(1)')
a.click()
time.sleep (3)    
emailElem = browser.find_element_by_css_selector('.HH-AuthForm-Login')
emailElem.click()
time.sleep (1)
emailElem.send_keys('example@yandex.ru')    
time.sleep (1)
passElem = browser.find_element_by_css_selector('.HH-AuthForm-Password')
passElem.click()
time.sleep (1)
passElem.send_keys('password')
passElem.submit()
time.sleep (3)

example@yandex.ru和密码-用雇主的电子邮件替换，密码-用密码替换。

CSV写块

def write_csv(data):
    with open('hh.csv','a',encoding='utf8') as f:
        writer=csv.writer(f)
        writer.writerow((data['name'],
                         data['age'],
                         data['salary'],                         
                         data['stag'],
                         #data['post_job_place'],
                         data['resume_link'],
                         data['photo_big']
                         #data['job_places'],
                         #data['education'],
                         #data['address'],
                         #data['update']
                         ))

解析块

def resume_get():    
    # 
    a=browser.find_elements_by_class_name('resume-search-item__content-wrapper') # 100 
    #len(a)
    #resume-search-item__description-content -  
    for i in a:
        b=i.find_element_by_class_name('resume-search-item__header')
        name=b.find_element_by_class_name('resume-search-item__name').text # 
        age=b.find_element_by_class_name('resume-search-item__fullname').text # 52 
        salary=b.find_element_by_class_name('resume-search-item__compensation').text # 40000 .
        stag=i.find_elements_by_class_name('resume-search-item__description-content')[0].text # '7   8 '
        resume_link=i.find_element_by_class_name('resume-search-item__name').get_attribute('href') #  
        #post_job_place=i.find_elements_by_class_name('resume-search-item__description-content')[1].text #.  
        #job_places=b.find_elements_by_class_name('resume-search-item__description-content')[1:3] # 
        #education=i.find_elements_by_class_name('resume-search-item__description-content')[-1].text # 
        #photo_small=browser.find_element_by_class_name('resume-userpic').find_element_by_class_name('resume-userpic__photo').get_attribute('src') #   
        try:
            photo_big=i.find_element_by_class_name('bloko-modal-content').find_element_by_tag_name('img').get_attribute('src') #  -
        except:
            photo_big=''

     
        #update=i.find_element_by_class_name('output__addition').text #  
        data={    'name':name,
                  'age':age,
                  'salary':salary,
                  'stag':stag,                  
                  #'post_job_place':post_job_place,
                  'resume_link':resume_link,
                  'photo_big':photo_big
                  #'job_places':job_places,
                  #'education':education,
                  #'address':address,
                  #'update':update
                  }
        #print(data)
        write_csv(data)

带简历的50页分页迭代

resume_get()
x=0

while x!=50:
    browser.get (url+'&page='+str(x+1))
    time.sleep(7)
    resume_get()
    x+=1

*为想要向选择中添加信息的人添加了注释片段。

在制定了程序并将结果加载到excel后，我们得到了表格：

如何查找简历？最简单的方法是筛选候选人的年龄。

如何查看摘要在第n个时间段后出现的位置？在第n次时间内驱动程序，然后再次找到自己。简历在图表上所处的位置将是发行中的位置，因为程序会根据要求顺序收集所有简历。

最后，关于律师履历“ Muscovites”的小型图形统计（2000履历）。

ps。应“公司+律师”的要求，恢复每天不超过2000个职位的网站下陷。

程序- 下载
Excel电子表格-下载

我们在没有API的情况下分析hh.ru上的律师简历

More articles: