坦克世界解析和数据分析的简单示例

在这个小示例中,我想展示如何解析站点中的数据以及如何将其进一步用于分析。为此,我从《战车世界》游戏中解析了战队等级表,并研究了战队等级如何与其他数据相关联。



1.解析数据


import numpy as np
import pandas as pd
from scrapy.selector import Selector
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

sns.set(rc={'figure.figsize':(20, 5)})

javascript , - scrapy html , ( selenium) scrapy Selector .


#https://ru.wargaming.net/clans/wot/leaderboards/#ratingssearch&offset=0&limit=25&order=-cr
with open(' _ Wargaming.net.html', 'r') as f:
    html_text = f.read()

selector = Selector(text=html_text)

xpath - youtube.com, . , .. (. ), , text , ( ), , , .


df = pd.DataFrame()

#    +  
table = selector.xpath('//div[@class="js-widget-content"]/div[2]/div[2]/div') + selector.xpath('//div[@class="js-widget-content"]/div[2]/div[4]/div')

for row in table:
    text = row.xpath('*//text()').extract()
    clan = text[-10] 
    cr = int(text[-7].replace(' ',''))
    wpr = int(text[-6].replace(' ','')) 
    abd = float(text[-5].replace(',', '.')) 
    avl_10 = int(text[-4]) 
    fsh = int(text[-3].replace(' ','')) 
    wgm = int(text[-2].replace(' ','')) 
    wsh = int(text[-1].replace(' ',''))   

    df = df.append({'Clan' : clan,      # 
                    'CR' : cr,          #  
                    'wPR' : wpr,        #           (100)
                    'aB_D' : abd,       #           
                    'aVL10' : avl_10,   #   ,   ,      
                    'fSH' : fsh,        #   
                    'wGM' : wgm,        #      
                    'wSH' : wsh},       #     
                    ignore_index=True)

df, 25, 5 .


print('-   :', len(df))

-   : 25

df.head()


CRClanaB_DaVL10fSHwGMwPRwSH
015486.0[CM-1]18.2034.04083.02253.010325.02294.0
115148.0[R-BOY]18.8637.03745.01943.010267.02066.0
215041.0[CYS]17.4732.03649.02300.010251.01857.0
314984.0[I-YAN]16.8528.04080.02468.08992.02290.0
414952.0[YETT1]17.4129.04222.02159.08387.02474.0


2.


, - , . .


plt.xticks(rotation=45, ha="right")
ax = sns.lineplot(x='Clan', y='CR', data=df, marker='o', color='r', sort=False)
ax.set(xlabel='', ylabel=' ')
ax.set(xticks=df['Clan'].values);


. , ?


#      
def draw_corr(df, y1, y1_label, y2='CR', y2_label=' '):
    fig, ax = plt.subplots()
    plt.xticks(rotation=45, ha="right")
    sns.lineplot(x='Clan', y=y1, data=df, marker='o', color='b', label=y1_label, sort=False)
    ax.set(xlabel='', ylabel=y1_label)
    plt.legend(bbox_to_anchor=(0.01, 0.95), loc='upper left')
    ax2 = ax.twinx()
    sns.lineplot(x='Clan', y=y2, data=df, marker='o', color='r', label=y2_label, sort=False)
    ax2.set(ylabel=y2_label)
    plt.legend(bbox_to_anchor=(0.01, 0.85), loc='upper left');   

draw_corr(df, 'wPR', '   ')


draw_corr(df, 'aVL10', '  ')


3.


本文专门写来只是为了表明数据解析和分析没有复杂的事情。根据获得的数据,您可以制作更多的相关图,构建直方图,必要时进行一些预测等。


源代码


All Articles