We use Data Science for the peaceful purposes of buying a home

To sell something unnecessary, you must first buy something unnecessary, but we do not have money.
- Three from Prostokvashino

Introduction


It so happened that I live in my apartment (or local condo) in Montreal. And once, about a year ago, I was visited by the thought that it would be nice to move to your own home. I already had some experience in buying and selling housing, and, in principle, it would be possible to approach this issue simply, as most local residents do: hire a realtor and let him deal with all the issues, but it would be boring and uninteresting.


Therefore, I decided to approach this matter scientifically. There is a task: you need to figure out how much what I have is worth, and where is what I can afford. Well, a passing question - to understand where the wind blows. And explore the geo-spatial calculations in the R .


In principle, it was immediately clear that I wouldn’t pull just a separate family house (locally) if I want to stay in a civilized area and hit global warming with a daily bike ride. Another common local option is to buy duplex or triplex, i.e. houses where there are two or three apartments: you live in one,in the rest you breed rabbitsthe rest are surrendered to the tenants. Then another unknown quantity appears - rental income.


Therefore, I wanted to make a map of the city with housing prices for sale, rental prices, and also be able to track how this all changes over time.


zillow, , , , , , : https://apciq.ca/en/real-estate-market/. , , .


, , , , , , : https://github.com/Froren/realtorca


— , , - requests beatifulsoap, .


— , , , , , ; , .


, openstreet map, .



— , , sqlite , , . , , , , ..


R, tidy-verse, Simple Features for R, — - Geocomputation with R, ggplot2 ( tidyverse), tmap.


, , (join?) .



, , dplyr , :


R , :


library(tidyverse)
library(sf)

property<-read_csv("....") %>% 
 st_as_sf(coords=c("lng","lat"), crs=4326) %>% 
 st_transform(crs=32188)

:


neighbourhood<-geojson_sf("quartierreferencehabitation.geojson") %>%
 st_transform(32188) %>% 
 filter(nom_qr %in% c("Saint-Louis", "Milton-Parc")) %>% 
 summarize() %>% 
 st_buffer(dist=0)

:


neighbors <- st_join(property, neighbourhood, left=F)

openstreetmap :


osm_neighbourhood<-read_osm(st_bbox(neighbourhood%>%st_transform(4326)), ext=1.5, type="esri")

tmap :


library(tmap)
library(tmaptools)

tm_shape(osm_neighbourhood) + tm_rgb(alpha=0.7)+
  tm_shape(neighbourhood) + tm_borders(col='red',alpha=0.8)  + 
  tm_shape(neighbors) + tm_symbols(shape=3,size=0.2,alpha=0.8) +
  tm_shape(ref_home) + tm_symbols(col='red',shape=4,size=0.5,alpha=0.8)+
  tm_compass(position=c("right", "bottom"))+
  tm_scale_bar(position=c("right", "bottom"))

image
, :


image


( ):


lm(price ~ parking:area_interior)

:


## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                33776.10   22175.97   1.523    0.129    
## parkingFALSE:area_interior   444.28      23.54  18.876   <2e-16 ***
## parkingTRUE:area_interior    523.01      19.65  26.614   <2e-16 ***

.. 444$ 33, +523$.


, 443k$, [433k$ — 453k$]


, , :


image


.. , .. . , . , , , generalized linear model inverse Gaussian distribution , - , :


image


: 435k$, 95% [419k$ — 450k$] — , .


, , , — .
, , — .. , , ( X X ) .


, () , ( , ).


image


generalized linear model inverse Gaussian distribution :


glm(price_sqft ~ parking + bedrooms,family=inverse.gaussian(link="log")

:


## (Intercept)    parkingTRUE   bedrooms2   bedrooms3   bedrooms4 
## 503.1981961   1.1215828   0.9720589   0.9662187   0.8325715

.. , 503$, 12% , — 2.8%, 3 — 3.3%, 4 17%, .
430k$ [ 413k$ — 448k$]



. .
- , - , — - , ?
, loess.
image


, — - .
image


, . , ( ) .


“ ” Generalized additive model


, . R mgcv gam:


gam(price_sqft ~ parking + bedrooms + s(start_date, k=24), family=inverse.gaussian(link="log"))

, , inverse Gaussian distribution, , , 24 . gam — , k .


( 2 ):


image


, : 429k [413k-447k], . . , .



, , .
image


, 60 . , .



, , . , 1, - :


#     
selected_mls=17758383 
#    2
max_distance=2000  
#       
plex_pe<-prop_geo_p %>% filter(type!='Apartment', type!='House')
ref<-plex_pe%>%filter(mls==selected_mls) 

#     
search_roi <- st_buffer(ref, max_distance) 
#      ,    -  
result <- st_intersection(plex_pe %>% filter(mls!=selected_mls), search_roi) %>% 
filter(area_interior<10000, area_interior>100,area_land>0,price<1e7,price>100 ) 

:


image


:


image


, , , - — , ( XX ), ..
, 523k$, [ 570k$ — 620k$]



, . , . sf :


, (), , :


aggregate(filter(kijiji_geo_p,bedrooms==2)%>%dplyr::select(price), mtl_p, median, join = st_contains)

image


, . . .
, :


gam(price_sqft ~ type + bedrooms + parking + s(x,y,k=100), family=inverse.gaussian(link="log"))

, 100:


pred_rent_whole <- raster(extent(mtl_land),res=100)
crs(pred_rent_whole)<-crs(mtl_land)
my_predict<-function(...) predict(...,type="response")
pred_rent_whole<- raster::interpolate(pred_rent_whole, model_rent_geo_whole, fun=my_predict, xyOnly=T,const=data.frame(bedrooms=2))

#      
pred_rent_whole <- mask(pred_rent_whole, mtl_land)

tmap:


tm_shape(osm_mtl)+tm_rgb(alpha=0.6)+
  tm_shape(mtl_arr) + tm_borders(alpha=0.8, col='black')+
  tm_shape(pred_rent_whole)+tm_raster(style="cont",alpha=0.7, title='$')+  tm_shape(subway_stop_p%>%dplyr::select(stop_name))+tm_symbols(col='blue',alpha=0.2,size=0.03)+
  tm_shape(subway_p)+tm_lines(col='blue',alpha=0.2)+
  tm_compass(position=c("right", "bottom"))+
  tm_scale_bar(position=c("left", "bottom"))+
  tm_layout(scale=1.5)

image
— .


, .
image


, .
image


, ( /( * ).
.


image


( / ).
image


, , ( ).


image



, R evaluate what, when and where to buy or sell. But life is a more complicated thing, in real application there is not enough knowledge of the real selling price (in our area this is available only to registered realtors). So you should not expect that the forecasts obtained will coincide with reality by 100%. In general, whoever did not hide is not my fault.


Source


All data and source code are in the repository . Buy our elephants!


Bonus for those who have read to the end


Interactive map with the results: http://www.ilmarin.info/re_mtl/


All Articles