Development of an interactive coronavirus distribution map of type 2019-nCoV in Python

Coronavirus type 2019-nCoV, after an outbreak in the Chinese city of Wuhan, is rapidly spreading around the world. At the time of writing the original article (January 30, 2020), more than 9,000 infected and 213 dead were reported, as of today (February 10, 2020), 40,570 infected have already been reported, 910 people have died. Cases of coronavirus infection have been detected in France, Australia, Russia, Japan, Singapore, Malaysia, Germany, Italy, Sri Lanka, Cambodia, Nepal and many other countries. No one knows when the virus will be stopped. In the meantime, the number of confirmed cases of coronavirus is only growing.

The author of the article we are translating today wants to talk about how, using Python, to create a simple application to track the spread of coronavirus. After completing work on this application, the reader will have an HTML page that displays a map of the spread of the virus and a slider that allows you to select the date by which the data is displayed on the map.


Interactive map of the distribution of coronavirus type 2019-nCoV

Technologies such as Python 3.7, Pandas, Plotly 4.1.0 and Jupyter Notebook will be used here.

Import libraries


Let's start by importing into a project based on the Jupyter Notebook, the Plotly and Pandas libraries.

import plotly.offline as go_offline
import plotly.graph_objects as go
import pandas as pd

Before moving on, try running the code. If you did not see error messages, then all the necessary libraries have been installed and are working correctly. If the code failed to run, take a look at the official pages of Plotly and Pandas and read the reference and installation materials for these libraries. If you do not have a Jupyter Notebook system running on your hardware, I recommend using Google Colab , a cloud-based platform that allows you to work with Jupyter Notebook.

Data processing


The data we use here can be found here . This is a shared spreadsheet of Google docs that is updated daily. Many thanks to all those who keep it up to date! You are doing a very necessary job.

We will read the data using the Pandas method read_csv. But before loading the data from the table, using the link to it, we need to work with this link. Now it looks like this: We need to replace the selected fragment of the link, bringing the link to this form: In the following code, we initialize the variable by writing a link to the data into it, read the data using the method and write to empty cells containing values .

https://docs.google.com/spreadsheets/d/18X1VM1671d99V_yd-cnUI1j8oSG2ZgfU_q1HfOizErA/edit#gid=0



https://docs.google.com/spreadsheets/d/18X1VM1671d99V_yd-cnUI1j8oSG2ZgfU_q1HfOizErA/export?format=csv&id

urlread_csvNaN0

url='https://docs.google.com/spreadsheets/d/18X1VM1671d99V_yd-cnUI1j8oSG2ZgfU_q1HfOizErA/export?format=csv&id'
data=pd.read_csv(url)
data=data.fillna(0)

Understanding how the data structures that we use are structured is extremely important at this stage of the work, as this determines what approach to data processing we apply. View the data using the command data.head(). This will lead to the output of the first 5 rows of the table.


The first 5 rows of coronavirus data

In the lower left corner you can see information that there are 47 columns in the data table. Here are the names of the first five columns:country,location_id,location,latitudeandlongitude. Other columns are pairs whose names are constructed as follows:confirmedcase_dd-mm-yyyyanddeaths_dd-mm-yyyy. The total number of columns in the table at the time of writing this material was 47. This means that I had data for 21 days at my disposal ((47-5) / 2 = 21). If the start date for data collection was 10.01.2020, then the end date was 30.01.2020.

The names of the first five columns of the table do not change, but over time columns with new names will be added to the table. What our interactive map will output is a visualization of the distribution of the coronavirus with the ability to indicate the day according to which the map is formed. Therefore, we need to separate the entire data set by selecting information for each day from it and taking into account that the first 5 columns of the table do not change, and that every day is described by two columns. Then, if you take a closer look at the data, for example, the data for 10.01.2020, it turns out that many dates correspond to this date. In fact, at this date the detection of coronavirus was confirmed in only one place, which is marked by the corresponding number. All other lines on this date contain only zeros. It means,that we need to exclude these lines from the map building process.

The process of preparing data for visualization is carried out in a cycle.

#  
fig=go.Figure()
col_name=data.columns
n_col=len(data.columns)
date_list=[]
init=4
n_range=int((n_col-5)/2)

#,          
for i in range(n_range):
    col_case=init+1
    col_dead=col_case+1
    init=col_case+1
    df_split=data[['latitude','longitude','country','location',col_name[col_case],col_name[col_dead]]]
    df=df_split[(df_split[col_name[col_case]]!=0)]
    lat=df['latitude']
    lon=df['longitude']
    case=df[df.columns[-2]].astype(int)
    deaths=df[df.columns[-1]].astype(int)
    df['text']=df['country']+'<br>'+df['location']+'<br>'+'confirmed cases: '+ case.astype(str)+'<br>'+'deaths: '+deaths.astype(str)
    date_label=deaths.name[7:17]
    date_list.append(date_label)
    
    #  Scattergeo
    fig.add_trace(go.Scattergeo(
    name='',
    lon=lon,
    lat=lat,
    visible=False,
    hovertemplate=df['text'],
    text=df['text'],
    mode='markers',
    marker=dict(size=15,opacity=0.6,color='Red', symbol='circle'),
    ))

During operation, the output of each data set is added to the Scattergeo graph using fig.add_trace. At the time of writing, the data on the basis of which the images will be built are represented by 21 objects. You can verify this using the command fig.data.

Slider creation


Here we will create a slider with the help of which the selection of data visualized on the map is organized. Here is the relevant code:

# 
steps = []
for i in range(len(fig.data)):
    step = dict(
        method="restyle",
        args=["visible", [False] * len(fig.data)],
        label=date_list[i],
    )
    step["args"][1][i] = True  #  i-     "visible"
    steps.append(step)
    
sliders = [dict(
    active=0,
    currentvalue={"prefix": "Date: "},
    pad={"t": 1},
    steps=steps
)]

The slider code consists of two main fragments. The first is a loop in which the list stepsused when moving the slider slider is populated . When you move the slider, the corresponding data set is visualized and what is displayed before is hidden. The second part of the code is the inclusion of the previously constructed list stepsin the slider object. When the slider moves, it selects the corresponding element from steps.

Map output and save it as an HTML file


Now we come to the final part of the material. Here we talk about how to display the map, and how to save it in HTML format. Here is the code that implements these operations:

#    
fig.data[0].visible=True

#       HTML
fig.update_layout(sliders=sliders,title='Coronavirus Spreading Map'+'<br>geodose.com',height=600)
fig.show()
go_offline.plot(fig,filename='F:/html/map_ncov.html',validate=True, auto_open=False)

When a map is displayed, a visual representation of the first data set is visible. Then we make the contents of the map update according to the position of the slider. Here we set the map title and adjust its height. At the last step, we display the map using the method fig.show, and then save it in HTML using the method go_offline.plot.

Full project code


Here is the complete project code for creating a coronavirus distribution map of type 2019-nCoV. Please note that the last line responsible for saving the HTML version of the map needs to be edited, replacing the path specified there with the one that is relevant to you.

import plotly.offline as go_offline
import plotly.graph_objects as go
import pandas as pd

# 
url='https://docs.google.com/spreadsheets/d/18X1VM1671d99V_yd-cnUI1j8oSG2ZgfU_q1HfOizErA/export?format=csv&id'
data=pd.read_csv(url)
data=data.fillna(0)

#  
fig=go.Figure()
col_name=data.columns
n_col=len(data.columns)
date_list=[]
init=4
n_range=int((n_col-5)/2)

#,          
for i in range(n_range):
    col_case=init+1
    col_dead=col_case+1
    init=col_case+1
    df_split=data[['latitude','longitude','country','location',col_name[col_case],col_name[col_dead]]]
    df=df_split[(df_split[col_name[col_case]]!=0)]
    lat=df['latitude']
    lon=df['longitude']
    case=df[df.columns[-2]].astype(int)
    deaths=df[df.columns[-1]].astype(int)
    df['text']=df['country']+'<br>'+df['location']+'<br>'+'confirmed cases: '+ case.astype(str)+'<br>'+'deaths: '+deaths.astype(str)
    date_label=deaths.name[7:17]
    date_list.append(date_label)
    
    #  Scattergeo
    fig.add_trace(go.Scattergeo(
    name='',
    lon=lon,
    lat=lat,
    visible=False,
    hovertemplate=df['text'],
    text=df['text'],
    mode='markers',
    marker=dict(size=15,opacity=0.6,color='Red', symbol='circle'),
    ))


# 
steps = []
for i in range(len(fig.data)):
    step = dict(
        method="restyle",
        args=["visible", [False] * len(fig.data)],
        label=date_list[i],
    )
    step["args"][1][i] = True  #  i-     "visible"
    steps.append(step)
    
sliders = [dict(
    active=0,
    currentvalue={"prefix": "Date: "},
    pad={"t": 1},
    steps=steps
)]

#    
fig.data[0].visible=True

#       HTML
fig.update_layout(sliders=sliders,title='Coronavirus Spreading Map'+'<br>geodose.com',height=600)
fig.show()
go_offline.plot(fig,filename='F:/html/map_ncov_slider.html',validate=True, auto_open=False)

Summary


We have completed a review of a guide on creating an interactive map for visualizing the distribution of coronavirus type 2019-nCoV. After working through this material, you learned how to read data from Google’s public spreadsheets, how to process data using Pandas, and how to visualize this data on an interactive map using the slider and Plotly. The result of the project as an HTML page can be downloaded from here.. The information displayed on the map depends on the data table. Each time the project code is executed, the map is updated, and fresh data from the table becomes available on it. This is a very simple map. There are many ways to improve it. For example, it can be supplemented with additional graphs, some summary data, and so on. If you are interested, you can do all this and much more yourself.

Dear readers! For what tasks do you use Jupyter Notebook technology?

Source: https://habr.com/ru/post/undefined/


All Articles