A simple example of parsing and data analytics for World of Tanks

In this small example, I want to show how to parse data from sites and how to use them further for analysis. To do this, I parsed the clan ratings table from the World of Tanks game and looked at how the clan rating can correlate with other data.



1. Parsing data


import numpy as np
import pandas as pd
from scrapy.selector import Selector
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

sns.set(rc={'figure.figsize':(20, 5)})

javascript , - scrapy html , ( selenium) scrapy Selector .


#https://ru.wargaming.net/clans/wot/leaderboards/#ratingssearch&offset=0&limit=25&order=-cr
with open(' _ Wargaming.net.html', 'r') as f:
    html_text = f.read()

selector = Selector(text=html_text)

xpath - youtube.com, . , .. (. ), , text , ( ), , , .


df = pd.DataFrame()

#    +  
table = selector.xpath('//div[@class="js-widget-content"]/div[2]/div[2]/div') + selector.xpath('//div[@class="js-widget-content"]/div[2]/div[4]/div')

for row in table:
    text = row.xpath('*//text()').extract()
    clan = text[-10] 
    cr = int(text[-7].replace(' ',''))
    wpr = int(text[-6].replace(' ','')) 
    abd = float(text[-5].replace(',', '.')) 
    avl_10 = int(text[-4]) 
    fsh = int(text[-3].replace(' ','')) 
    wgm = int(text[-2].replace(' ','')) 
    wsh = int(text[-1].replace(' ',''))   

    df = df.append({'Clan' : clan,      # 
                    'CR' : cr,          #  
                    'wPR' : wpr,        #           (100)
                    'aB_D' : abd,       #           
                    'aVL10' : avl_10,   #   ,   ,      
                    'fSH' : fsh,        #   
                    'wGM' : wgm,        #      
                    'wSH' : wsh},       #     
                    ignore_index=True)

df, 25, 5 .


print('-   :', len(df))

-   : 25

df.head()


CRClanaB_DaVL10fSHwGMwPRwSH
015486.0[CM-1]18.2034.04083.02253.010325.02294.0
115148.0[R-BOY]18.8637.03745.01943.010267.02066.0
215041.0[CYS]17.4732.03649.02300.010251.01857.0
314984.0[I-YAN]16.8528.04080.02468.08992.02290.0
414952.0[YETT1]17.4129.04222.02159.08387.02474.0


2.


, - , . .


plt.xticks(rotation=45, ha="right")
ax = sns.lineplot(x='Clan', y='CR', data=df, marker='o', color='r', sort=False)
ax.set(xlabel='', ylabel=' ')
ax.set(xticks=df['Clan'].values);


. , ?


#      
def draw_corr(df, y1, y1_label, y2='CR', y2_label=' '):
    fig, ax = plt.subplots()
    plt.xticks(rotation=45, ha="right")
    sns.lineplot(x='Clan', y=y1, data=df, marker='o', color='b', label=y1_label, sort=False)
    ax.set(xlabel='', ylabel=y1_label)
    plt.legend(bbox_to_anchor=(0.01, 0.95), loc='upper left')
    ax2 = ax.twinx()
    sns.lineplot(x='Clan', y=y2, data=df, marker='o', color='r', label=y2_label, sort=False)
    ax2.set(ylabel=y2_label)
    plt.legend(bbox_to_anchor=(0.01, 0.85), loc='upper left');   

draw_corr(df, 'wPR', '   ')


draw_corr(df, 'aVL10', '  ')


3.


This article is specially written simply to show that there is nothing complicated in data parsing and analytics. From the data obtained, you can make more correlation graphs, build histograms, if necessary, make some predictions, etc.


β†’ Source Code


All Articles