IBM Data Science Professional Certificate Certified

This article is an overview of IBM Data Science Professional Certificate certification .


As a newbie in Python, I had to deal with the implementation of tasks:


  • Loading and parsing HTML tables
  • Cleaning uploaded data
  • Search for geographical coordinates at the address of the object
  • Download and process GEOJSON
  • Building interactive heat maps (heat map)
  • Building interactive background cartograms (choropleth map)
  • Transformation of geographic coordinates between a spherical WGS84 and a Cartesian coordinate system UTM
  • Representation of spatial geographical objects in the form of a hexagonal grid of circles
  • Search for geographic features located at a specific distance from a point
  • Georeferencing geographic features to complex surface polygons
  • Descriptive Statistical Analysis
  • Analysis of categorical variables and visualization of results
  • Correlation analysis and visualization of results
  • Segmentation using k-Mean clustering and the elbow method
  • Analysis and visualization of clusters

During the certification, 9 courses were required:



3 .
«Python for Data Science and AI» «Databases and SQL for Data Science» , .


«Data Analysis with Python», «Data Visualization with Python», «Machine Learning with Python» . Python.


Applied Data Science Capstone – , Python, .


Applied Data Science Capstone.


Github.
— 700 .


.


Python




1. Introduction


1.1 Background


— 12 . 2561.5 ², 4924.96 /² 1.


12 (125 , 2 , 19 ) 30429 /² "", 560 /² "" 2.


68,768 /² "" 438,568 /² "" 3.


1.2 Business Problem


, , , . () .


, , , .


, , , .


:


  • ()

2. Data acquisition and cleaning


2.2. Data requirements


, :


  1. , :


    • ,
    • ²,


  2. GEOJSON


  3. ( , , ...),



2.3. Describe data sources


HTML .


  • ²,

Nominatim. , , , .


GEOJSON


( , ..) Foursquare API. , . 100 1000 . :


  • , 100 .
  • , Foursquare API .
  • , Foursquare API ( )
  • .

2.4. Describe data cleansing


HTML , . :


  • ' \n\t', ' ', ''
  • "" ""
  • float int

146 :


  • Borough_Name — —
  • District_Name — ,
  • Borough_Type
  • OKATO_Borough_Code — OKATO
  • OKTMO_District_Code — OKTMO
  • Borough_Area
  • Borough_Population
  • Borough_Population_Density
  • Borough_Housing_Area — .
  • Borough_Housing_Area_Per_Person
  • Borough_Housing_Price


.


Moscow Boroughs dataset


Nominatim, , ( 10 ) .


. , Foursquare API.


Boroughs Population in Moscow City


, ( , , ...), , . 28 000 .


. Foursquare API.


Example of the hexagonal grid of area candidates


Forsquare API, 34460 ( , , ...) 7899 .


(350 ) , (300 ), .


27622 28 000 .


, , . GEOJSON .
, , .


, .


20864 ( , , ...), 120 . .
.


Moscow venues dataset


" " " ".


Example of the some Moscow Boroughs and theis venues


3. Methodology


:


  • /

:


  • , ,
  • K-Mean

, . .


3.1. Exploratory Data Analysis


:


  • District_Name — ,
  • Borough_Area
  • Borough_Population
  • Borough_Population_Density
  • Borough_Housing_Area — .
  • Borough_Housing_Price

.


Descriptive statistical analysis


, 12 194 253 943 .
109 421 /² 438 568 /².


District_Name Borough_Population. , District_Name , , «», «», «» «».


'District' and 'Population'


District_Name Borough_Housing_Price. , , District_Name .


'District' and 'Housing Price'


.
.


. (p-value):


  • p-value < 0.001: ,
  • p-value is < 0.05: ,
  • p-value is < 0.1: ,
  • p-value is > 0.1: ,

Correlation matrix


Correlation matrix values


  • Borough_Area, Borough_Population_Density Borough_Population ,
  • Borough_Housing_Area Borough_Population ,
  • Borough_Area, Borough_Population_Density, Borough_Housing_Area Borough_Housing_Price ,
  • Borough_Area Borough_Population_Density ,

3.2. Clustering


, , .


, K-Means Elbow , , .


Elbow , ( 1 10). (distortion) (inertia).


Distortion Inertia .
(elbows) 3 5 .


3- .


'Elbow_Method_Distortion'


'Elbow_Method_Inertia'


, K-Means , :


  • % ( )
  • % ( )


'Moscow_Clustering'


:


  • "0" — (78538 ), (173695 /²) (10328 /²)
  • "1" — (153187 ), (160741 /²) (13312 /²)
  • "2" — (79805 ), (333794 /²) (10533 /²)

"1" :


  • 34 43% , 37% ,

boxplot .


'Cluster_Borough_Population_boxplot'


'Cluster_Borough_Housing_Price_boxplot'


(choropleth map)
, "1" .
, .


'Moscow_Clustering_map'


4. Result


:


  • :
    • ()
  • ,
  • (heat map) (choropleth map)

.
.


Moscow_Recomended_Borough_df


20864 ( , , ...), 928 "Gym / Fitness Center", 259 .

.


Moscow_gym_venues_df


(heat map) (choropleth map) (HTML zip ) Interactive map
.


gym_heatmap_big


gym_heatmap_smal


gym_250


5. Discussion


:


  1. , :


    • ,
    • ²,

  2. GEOJSON
  3. ( , , ...),

, GitHub .csv .


K-Means Elbow . Distortion Inertia , 3- . 5 , .
.


To determine the proximity of competing fitness facilities, an approach was used consisting in displaying fitness centers on an interactive thermal geographical map. This approach allowed us to visually analyze the proximity of existing fitness centers in each area and identify areas with low density.


Based on the data obtained, an additional analysis can be carried out using categorical segmentation of fitness objects and automatic calculation of recommended locations for new fitness centers, taking into account the density of competing fitness objects.

Source: https://habr.com/ru/post/undefined/


All Articles