In most cases, when working with the response received from the API, or with any other data that has a complex tree structure, you are faced with the JSON and XML formats.
These formats have many advantages: they store data quite compactly and avoid unnecessary duplication of information.
The disadvantage of these formats is the complexity of their processing and analysis. Unstructured data cannot be used in calculations and visualization cannot be built on their basis.

This article is a logical continuation of the publication "R package tidyr and its new functions pivot_longer and pivot_wider" . It will help you bring unstructured data constructions to a familiar, and suitable for analysis, tabular form using the package tidyr
included in the core of the library tidyverse
and its family functions unnest_*()
.
Content
If you are interested in data analysis, you may be interested in my telegram and youtube channels. Most of the content is dedicated to R.
- Introduction
- GitHub Users
- Github repositories
- Game Of Thrones Characters
- Google
Rectangling (. , , .) ā , . tidyr
, - , :
unnest_longer()
- .unnest_wider()
- .unnest_auto()
unnest_longer()
unnest_wider()
.hoist()
unnest_wider()
.
, dplyr.
, repurrrsive
, , , -API.
library(tidyr)
library(dplyr)
library(repurrrsive)
GitHub
gh_users, , GitHub. gh_users tibble .:
users <- tibble( user = gh_users )
: gh_users, ? : , .
users
, .
names(users$user[[1]])
#> [1] "login" "id" "avatar_url"
#> [4] "gravatar_id" "url" "html_url"
#> [7] "followers_url" "following_url" "gists_url"
#> [10] "starred_url" "subscriptions_url" "organizations_url"
#> [13] "repos_url" "events_url" "received_events_url"
#> [16] "type" "site_admin" "name"
#> [19] "company" "blog" "location"
#> [22] "email" "hireable" "bio"
#> [25] "public_repos" "public_gists" "followers"
#> [28] "following" "created_at" "updated_at"
. unnest_wider()
:
users %>% unnest_wider(user)
#> # A tibble: 6 x 30
#> login id avatar_url gravatar_id url html_url followers_url
#> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 gabo⦠6.60e5 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 2 jenn⦠5.99e5 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 3 jtle⦠1.57e6 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 4 juli⦠1.25e7 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 5 leep⦠3.51e6 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 6 masa⦠8.36e6 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> # ⦠with 23 more variables: following_url <chr>, gists_url <chr>,
#> # starred_url <chr>, subscriptions_url <chr>, organizations_url <chr>,
#> # repos_url <chr>, events_url <chr>, received_events_url <chr>,
#> # type <chr>, site_admin <lgl>, name <chr>, company <chr>, blog <chr>,
#> # location <chr>, email <chr>, public_repos <int>, public_gists <int>,
#> # followers <int>, following <int>, created_at <chr>, updated_at <chr>,
#> # bio <chr>, hireable <lgl>
30 , , unnest_wider()
hoist()
. hoist()
, , purrr::pluck()
:
users %>% hoist(user,
followers = "followers",
login = "login",
url = "html_url"
)
#> # A tibble: 6 x 4
#> followers login url user
#> <int> <chr> <chr> <list>
#> 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]>
#> 2 780 jennybc https://github.com/jennybc <named list [27]>
#> 3 3958 jtleek https://github.com/jtleek <named list [27]>
#> 4 115 juliasilge https://github.com/juliasilge <named list [27]>
#> 5 213 leeper https://github.com/leeper <named list [27]>
#> 6 34 masalmon https://github.com/masalmon <named list [27]>
hoist()
- user, hoist()
.
Github
gh_repos
, tibble
:
repos <- tibble(repo = gh_repos)
repos
#> # A tibble: 6 x 1
#> repo
#> <list>
#> 1 <list [30]>
#> 2 <list [30]>
#> 3 <list [30]>
#> 4 <list [26]>
#> 5 <list [30]>
#> 6 <list [30]>
user , . , (. tidy data) , unnest_longer()
unnest_wider()
:
repos <- repos %>% unnest_longer(repo)
repos
#> # A tibble: 176 x 1
#> repo
#> <list>
#> 1 <named list [68]>
#> 2 <named list [68]>
#> 3 <named list [68]>
#> 4 <named list [68]>
#> 5 <named list [68]>
#> 6 <named list [68]>
#> 7 <named list [68]>
#> 8 <named list [68]>
#> 9 <named list [68]>
#> 10 <named list [68]>
#> # ⦠with 166 more rows
unnest_wider()
hoist()
:
repos %>% hoist(repo,
login = c("owner", "login"),
name = "name",
homepage = "homepage",
watchers = "watchers_count"
)
#> # A tibble: 176 x 5
#> login name homepage watchers repo
#> <chr> <chr> <chr> <int> <list>
#> 1 gaborcsardi after <NA> 5 <named list [65]>
#> 2 gaborcsardi argufy <NA> 19 <named list [65]>
#> 3 gaborcsardi ask <NA> 5 <named list [65]>
#> 4 gaborcsardi baseimports <NA> 0 <named list [65]>
#> 5 gaborcsardi citest <NA> 0 <named list [65]>
#> 6 gaborcsardi clisymbols "" 18 <named list [65]>
#> 7 gaborcsardi cmaker <NA> 0 <named list [65]>
#> 8 gaborcsardi cmark <NA> 0 <named list [65]>
#> 9 gaborcsardi conditions <NA> 0 <named list [65]>
#> 10 gaborcsardi crayon <NA> 52 <named list [65]>
#> # ⦠with 166 more rows
c("owner", "login")
: owner
. , owner
unnest_wider()
:
repos %>%
hoist(repo, owner = "owner") %>%
unnest_wider(owner)
#> # A tibble: 176 x 18
#> login id avatar_url gravatar_id url html_url followers_url
#> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 2 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 3 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 4 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 5 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 6 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 7 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 8 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 9 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> 10 gabo⦠660288 https://a⦠"" http⦠https:/⦠https://api.ā¦
#> # ⦠with 166 more rows, and 11 more variables: following_url <chr>,
#> # gists_url <chr>, starred_url <chr>, subscriptions_url <chr>,
#> # organizations_url <chr>, repos_url <chr>, events_url <chr>,
#> # received_events_url <chr>, type <chr>, site_admin <lgl>, repo <list>
, unnest_longer()
unnest_wider()
unnest_auto()
. , .
tibble(repo = gh_repos) %>%
unnest_auto(repo) %>%
unnest_auto(repo)
#> Using `unnest_longer(repo)`; no element has names
#> Using `unnest_wider(repo)`; elements have 68 names in common
#> # A tibble: 176 x 67
#> id name full_name owner private html_url description fork url
#> <int> <chr> <chr> <lis> <lgl> <chr> <chr> <lgl> <chr>
#> 1 6.12e7 after gaborcsa⦠<nam⦠FALSE https:/⦠Run Code i⦠FALSE httpā¦
#> 2 4.05e7 argu⦠gaborcsa⦠<nam⦠FALSE https:/⦠Declarativ⦠FALSE httpā¦
#> 3 3.64e7 ask gaborcsa⦠<nam⦠FALSE https:/⦠Friendly C⦠FALSE httpā¦
#> 4 3.49e7 base⦠gaborcsa⦠<nam⦠FALSE https:/⦠Do we get ⦠FALSE httpā¦
#> 5 6.16e7 cite⦠gaborcsa⦠<nam⦠FALSE https:/⦠Test R pac⦠TRUE httpā¦
#> 6 3.39e7 clis⦠gaborcsa⦠<nam⦠FALSE https:/⦠Unicode sy⦠FALSE httpā¦
#> 7 3.72e7 cmak⦠gaborcsa⦠<nam⦠FALSE https:/⦠port of cm⦠TRUE httpā¦
#> 8 6.80e7 cmark gaborcsa⦠<nam⦠FALSE https:/⦠CommonMark⦠TRUE httpā¦
#> 9 6.32e7 cond⦠gaborcsa⦠<nam⦠FALSE https:/⦠<NA> TRUE httpā¦
#> 10 2.43e7 cray⦠gaborcsa⦠<nam⦠FALSE https:/⦠R package ⦠FALSE httpā¦
#> # ⦠with 166 more rows, and 58 more variables: forks_url <chr>,
#> # keys_url <chr>, collaborators_url <chr>, teams_url <chr>,
#> # hooks_url <chr>, issue_events_url <chr>, events_url <chr>,
#> # assignees_url <chr>, branches_url <chr>, tags_url <chr>,
#> # blobs_url <chr>, git_tags_url <chr>, git_refs_url <chr>,
#> # trees_url <chr>, statuses_url <chr>, languages_url <chr>,
#> # stargazers_url <chr>, contributors_url <chr>, subscribers_url <chr>,
#> # subscription_url <chr>, commits_url <chr>, git_commits_url <chr>,
#> # comments_url <chr>, issue_comment_url <chr>, contents_url <chr>,
#> # compare_url <chr>, merges_url <chr>, archive_url <chr>,
#> # downloads_url <chr>, issues_url <chr>, pulls_url <chr>,
#> # milestones_url <chr>, notifications_url <chr>, labels_url <chr>,
#> # releases_url <chr>, deployments_url <chr>, created_at <chr>,
#> # updated_at <chr>, pushed_at <chr>, git_url <chr>, ssh_url <chr>,
#> # clone_url <chr>, svn_url <chr>, size <int>, stargazers_count <int>,
#> # watchers_count <int>, language <chr>, has_issues <lgl>,
#> # has_downloads <lgl>, has_wiki <lgl>, has_pages <lgl>,
#> # forks_count <int>, open_issues_count <int>, forks <int>,
#> # open_issues <int>, watchers <int>, default_branch <chr>,
#> # homepage <chr>
got_chars
gh_users
: , . got_chars
, , :
chars <- tibble(char = got_chars)
chars
#> # A tibble: 30 x 1
#> char
#> <list>
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>
#> 7 <named list [18]>
#> 8 <named list [18]>
#> 9 <named list [18]>
#> 10 <named list [18]>
#> # ⦠with 20 more rows
chars2 <- chars %>% unnest_wider(char)
chars2
#> # A tibble: 30 x 18
#> url id name gender culture born died alive titles aliases father
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr>
#> 1 http⦠1022 Theo⦠Male Ironbo⦠In 2⦠"" TRUE <chr ⦠<chr [⦠""
#> 2 http⦠1052 Tyri⦠Male "" In 2⦠"" TRUE <chr ⦠<chr [⦠""
#> 3 http⦠1074 Vict⦠Male Ironbo⦠In 2⦠"" TRUE <chr ⦠<chr [⦠""
#> 4 http⦠1109 Will Male "" "" In 2⦠FALSE <chr ⦠<chr [⦠""
#> 5 http⦠1166 Areo⦠Male Norvos⦠In 2⦠"" TRUE <chr ⦠<chr [⦠""
#> 6 http⦠1267 Chett Male "" At H⦠In 2⦠FALSE <chr ⦠<chr [⦠""
#> 7 http⦠1295 Cres⦠Male "" In 2⦠In 2⦠FALSE <chr ⦠<chr [⦠""
#> 8 http⦠130 Aria⦠Female Dornish In 2⦠"" TRUE <chr ⦠<chr [⦠""
#> 9 http⦠1303 Daen⦠Female Valyri⦠In 2⦠"" TRUE <chr ⦠<chr [⦠""
#> 10 http⦠1319 Davo⦠Male Wester⦠In 2⦠"" TRUE <chr ⦠<chr [⦠""
#> # ⦠with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>
got_chars
, gh_users
, .. char
, ā :
chars2 %>% select_if(is.list)
#> # A tibble: 30 x 7
#> titles aliases allegiances books povBooks tvSeries playedBy
#> <list> <list> <list> <list> <list> <list> <list>
#> 1 <chr [3]> <chr [4]> <chr [1]> <chr [3]> <chr [2]> <chr [6]> <chr [1]>
#> 2 <chr [2]> <chr [11]> <chr [1]> <chr [2]> <chr [4]> <chr [6]> <chr [1]>
#> 3 <chr [2]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [1]> <chr [1]>
#> 4 <chr [1]> <chr [1]> <???> <chr [1]> <chr [1]> <chr [1]> <chr [1]>
#> 5 <chr [1]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [2]> <chr [1]>
#> 6 <chr [1]> <chr [1]> <???> <chr [2]> <chr [1]> <chr [1]> <chr [1]>
#> 7 <chr [1]> <chr [1]> <???> <chr [2]> <chr [1]> <chr [1]> <chr [1]>
#> 8 <chr [1]> <chr [1]> <chr [1]> <chr [4]> <chr [1]> <chr [1]> <chr [1]>
#> 9 <chr [5]> <chr [11]> <chr [1]> <chr [1]> <chr [4]> <chr [6]> <chr [1]>
#> 10 <chr [4]> <chr [5]> <chr [2]> <chr [1]> <chr [3]> <chr [5]> <chr [1]>
#> # ⦠with 20 more rows
. , , :
chars2 %>%
select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
#> # A tibble: 180 x 3
#> name media value
#> <chr> <chr> <chr>
#> 1 Theon Greyjoy books A Game of Thrones
#> 2 Theon Greyjoy books A Storm of Swords
#> 3 Theon Greyjoy books A Feast for Crows
#> 4 Theon Greyjoy tvSeries Season 1
#> 5 Theon Greyjoy tvSeries Season 2
#> 6 Theon Greyjoy tvSeries Season 3
#> 7 Theon Greyjoy tvSeries Season 4
#> 8 Theon Greyjoy tvSeries Season 5
#> 9 Theon Greyjoy tvSeries Season 6
#> 10 Tyrion Lannister books A Feast for Crows
#> # ⦠with 170 more rows
, , , :
chars2 %>%
select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # ⦠with 50 more rows
( , ""
title
, got_chars
: title
0, 1, .)
unnest_auto()
. , unnest_auto()
. , unnest_auto()
, - unnest_longer()
, unnest_wider()
, .
tibble(char = got_chars) %>%
unnest_auto(char) %>%
select(name, title = titles) %>%
unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title)`; no element has names
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # ⦠with 50 more rows
Google
, Google. API Google maps, API. API Google ; API Google Maps, .
has_key <- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
if (!has_key) {
message("No Google Maps API key found; code chunks will not be run")
}
# https://developers.google.com/maps/documentation/geocoding
geocode <- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
url <- "https://maps.googleapis.com/maps/api/geocode/json"
url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)
jsonlite::read_json(url)
}
, , :
houston <- geocode("Houston TX")
str(houston)
#> List of 2
#> $ results:List of 1
#> ..$ :List of 5
#> .. ..$ address_components:List of 4
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Houston"
#> .. .. .. ..$ short_name: chr "Houston"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "locality"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Harris County"
#> .. .. .. ..$ short_name: chr "Harris County"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "administrative_area_level_2"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Texas"
#> .. .. .. ..$ short_name: chr "TX"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "administrative_area_level_1"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "United States"
#> .. .. .. ..$ short_name: chr "US"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "country"
#> .. .. .. .. ..$ : chr "political"
#> .. ..$ formatted_address : chr "Houston, TX, USA"
#> .. ..$ geometry :List of 4
#> .. .. ..$ bounds :List of 2
#> .. .. .. ..$ northeast:List of 2
#> .. .. .. .. ..$ lat: num 30.1
#> .. .. .. .. ..$ lng: num -95
#> .. .. .. ..$ southwest:List of 2
#> .. .. .. .. ..$ lat: num 29.5
#> .. .. .. .. ..$ lng: num -95.8
#> .. .. ..$ location :List of 2
#> .. .. .. ..$ lat: num 29.8
#> .. .. .. ..$ lng: num -95.4
#> .. .. ..$ location_type: chr "APPROXIMATE"
#> .. .. ..$ viewport :List of 2
#> .. .. .. ..$ northeast:List of 2
#> .. .. .. .. ..$ lat: num 30.1
#> .. .. .. .. ..$ lng: num -95
#> .. .. .. ..$ southwest:List of 2
#> .. .. .. .. ..$ lat: num 29.5
#> .. .. .. .. ..$ lng: num -95.8
#> .. ..$ place_id : chr "ChIJAYWNSLS4QIYROwVl894CDco"
#> .. ..$ types :List of 2
#> .. .. ..$ : chr "locality"
#> .. .. ..$ : chr "political"
#> $ status : chr "OK"
, tidyr
. , :
city <- c ( "Houston" , "LA" , "New York" , "Chicago" , "Springfield" ) city_geo <- purrr::map (city, geocode)
tibble
, .
loc <- tibble(city = city, json = city_geo)
loc
#> # A tibble: 5 x 2
#> city json
#> <chr> <list>
#> 1 Houston <named list [2]>
#> 2 LA <named list [2]>
#> 3 New York <named list [2]>
#> 4 Chicago <named list [2]>
#> 5 Springfield <named list [2]>
status
result
, unnest_wider()
:
loc %>%
unnest_wider(json)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <list [1]> OK
#> 2 LA <list [1]> OK
#> 3 New York <list [1]> OK
#> 4 Chicago <list [1]> OK
#> 5 Springfield <list [1]> OK
, results
. 1 ( , API ), . unnest_longer()
:
loc %>%
unnest_wider(json) %>%
unnest_longer(results)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <named list [5]> OK
#> 2 LA <named list [5]> OK
#> 3 New York <named list [5]> OK
#> 4 Chicago <named list [5]> OK
#> 5 Springfield <named list [5]> OK
, unnest_wider()
:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results)
#> # A tibble: 5 x 7
#> city address_componen⦠formatted_addre⦠geometry place_id types status
#> <chr> <list> <chr> <list> <chr> <lis> <chr>
#> 1 Houst⦠<list [4]> Houston, TX, USA <named ⦠ChIJAYWN⦠<lis⦠OK
#> 2 LA <list [4]> Los Angeles, CA⦠<named ⦠ChIJE9on⦠<lis⦠OK
#> 3 New Y⦠<list [3]> New York, NY, U⦠<named ⦠ChIJOwg_⦠<lis⦠OK
#> 4 Chica⦠<list [4]> Chicago, IL, USA <named ⦠ChIJ7cv0⦠<lis⦠OK
#> 5 Sprin⦠<list [5]> Springfield, MO⦠<named ⦠ChIJP5jI⦠<lis⦠OK
geometry
:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry)
#> # A tibble: 5 x 10
#> city address_compone⦠formatted_addre⦠bounds location location_type
#> <chr> <list> <chr> <list> <list> <chr>
#> 1 Hous⦠<list [4]> Houston, TX, USA <name⦠<named ⦠APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA⦠<name⦠<named ⦠APPROXIMATE
#> 3 New ⦠<list [3]> New York, NY, U⦠<name⦠<named ⦠APPROXIMATE
#> 4 Chic⦠<list [4]> Chicago, IL, USA <name⦠<named ⦠APPROXIMATE
#> 5 Spri⦠<list [5]> Springfield, MO⦠<name⦠<named ⦠APPROXIMATE
#> # ⦠with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
, location
:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone⦠formatted_addre⦠bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous⦠<list [4]> Houston, TX, USA <name⦠29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA⦠<name⦠34.1 -118. APPROXIMATE
#> 3 New ⦠<list [3]> New York, NY, U⦠<name⦠40.7 -74.0 APPROXIMATE
#> 4 Chic⦠<list [4]> Chicago, IL, USA <name⦠41.9 -87.6 APPROXIMATE
#> 5 Spri⦠<list [5]> Springfield, MO⦠<name⦠37.2 -93.3 APPROXIMATE
#> # ⦠with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
, unnest_auto()
, :
loc %>%
unnest_auto(json) %>%
unnest_auto(results) %>%
unnest_auto(results) %>%
unnest_auto(geometry) %>%
unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 4 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 x 11
#> city address_compone⦠formatted_addre⦠bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous⦠<list [4]> Houston, TX, USA <name⦠29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA⦠<name⦠34.1 -118. APPROXIMATE
#> 3 New ⦠<list [3]> New York, NY, U⦠<name⦠40.7 -74.0 APPROXIMATE
#> 4 Chic⦠<list [4]> Chicago, IL, USA <name⦠41.9 -87.6 APPROXIMATE
#> 5 Spri⦠<list [5]> Springfield, MO⦠<name⦠37.2 -93.3 APPROXIMATE
#> # ⦠with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
:
loc %>%
unnest_wider(json) %>%
hoist(results, first_result = 1) %>%
unnest_wider(first_result) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone⦠formatted_addre⦠bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous⦠<list [4]> Houston, TX, USA <name⦠29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA⦠<name⦠34.1 -118. APPROXIMATE
#> 3 New ⦠<list [3]> New York, NY, U⦠<name⦠40.7 -74.0 APPROXIMATE
#> 4 Chic⦠<list [4]> Chicago, IL, USA <name⦠41.9 -87.6 APPROXIMATE
#> 5 Spri⦠<list [5]> Springfield, MO⦠<name⦠37.2 -93.3 APPROXIMATE
#> # ⦠with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
hoist()
, lat
lng
.
loc %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
#> # A tibble: 5 x 4
#> city lat lng json
#> <chr> <dbl> <dbl> <list>
#> 1 Houston 29.8 -95.4 <named list [2]>
#> 2 LA 34.1 -118. <named list [2]>
#> 3 New York 40.7 -74.0 <named list [2]>
#> 4 Chicago 41.9 -87.6 <named list [2]>
#> 5 Springfield 37.2 -93.3 <named list [2]>
ā . , , , . date_added
R.
discs <- tibble(disc = discog) %>%
unnest_wider(disc) %>%
mutate(date_added = as.POSIXct(strptime(date_added, "%Y-%m-%dT%H:%M:%S")))
discs
#> # A tibble: 155 x 5
#> instance_id date_added basic_information id rating
#> <int> <dttm> <list> <int> <int>
#> 1 354823933 2019-02-16 17:48:59 <named list [11]> 7496378 0
#> 2 354092601 2019-02-13 14:13:11 <named list [11]> 4490852 0
#> 3 354091476 2019-02-13 14:07:23 <named list [11]> 9827276 0
#> 4 351244906 2019-02-02 11:39:58 <named list [11]> 9769203 0
#> 5 351244801 2019-02-02 11:39:37 <named list [11]> 7237138 0
#> 6 351052065 2019-02-01 20:40:53 <named list [11]> 13117042 0
#> 7 350315345 2019-01-29 15:48:37 <named list [11]> 7113575 0
#> 8 350315103 2019-01-29 15:47:22 <named list [11]> 10540713 0
#> 9 350314507 2019-01-29 15:44:08 <named list [11]> 11260950 0
#> 10 350314047 2019-01-29 15:41:35 <named list [11]> 11726853 0
#> # ⦠with 145 more rows
, , . basic_information
:
discs %>% unnest_wider(basic_information)
#> Column name `id` must not be duplicated.
#> Use .name_repair to specify repair.
, .. basic_information
basic_information
. , , names_repair = "unique"
:
discs %>% unnest_wider(basic_information, names_repair = "unique")
#> New names:
#> * id -> id...6
#> * id -> id...14
#> # A tibble: 155 x 15
#> instance_id date_added labels year artists id...6 thumb title
#> <int> <dttm> <list> <int> <list> <int> <chr> <chr>
#> 1 354823933 2019-02-16 17:48:59 <list⦠2015 <list ⦠7.50e6 http⦠Demo
#> 2 354092601 2019-02-13 14:13:11 <list⦠2013 <list ⦠4.49e6 http⦠Obseā¦
#> 3 354091476 2019-02-13 14:07:23 <list⦠2017 <list ⦠9.83e6 http⦠I
#> 4 351244906 2019-02-02 11:39:58 <list⦠2017 <list ⦠9.77e6 http⦠OĆdoā¦
#> 5 351244801 2019-02-02 11:39:37 <list⦠2015 <list ⦠7.24e6 http⦠A Caā¦
#> 6 351052065 2019-02-01 20:40:53 <list⦠2019 <list ⦠1.31e7 http⦠Tashā¦
#> 7 350315345 2019-01-29 15:48:37 <list⦠2014 <list ⦠7.11e6 http⦠Demo
#> 8 350315103 2019-01-29 15:47:22 <list⦠2015 <list ⦠1.05e7 http⦠Let ā¦
#> 9 350314507 2019-01-29 15:44:08 <list⦠2017 <list ⦠1.13e7 "" Sub ā¦
#> 10 350314047 2019-01-29 15:41:35 <list⦠2017 <list ⦠1.17e7 http⦠Demo
#> # ⦠with 145 more rows, and 7 more variables: formats <list>,
#> # cover_image <chr>, resource_url <chr>, master_id <int>,
#> # master_url <chr>, id...14 <int>, rating <int>
, basic_information
id , :
discs %>%
select(-id) %>%
unnest_wider(basic_information)
#> # A tibble: 155 x 14
#> instance_id date_added labels year artists id thumb title
#> <int> <dttm> <list> <int> <list> <int> <chr> <chr>
#> 1 354823933 2019-02-16 17:48:59 <list⦠2015 <list ⦠7.50e6 http⦠Demo
#> 2 354092601 2019-02-13 14:13:11 <list⦠2013 <list ⦠4.49e6 http⦠Obseā¦
#> 3 354091476 2019-02-13 14:07:23 <list⦠2017 <list ⦠9.83e6 http⦠I
#> 4 351244906 2019-02-02 11:39:58 <list⦠2017 <list ⦠9.77e6 http⦠OĆdoā¦
#> 5 351244801 2019-02-02 11:39:37 <list⦠2015 <list ⦠7.24e6 http⦠A Caā¦
#> 6 351052065 2019-02-01 20:40:53 <list⦠2019 <list ⦠1.31e7 http⦠Tashā¦
#> 7 350315345 2019-01-29 15:48:37 <list⦠2014 <list ⦠7.11e6 http⦠Demo
#> 8 350315103 2019-01-29 15:47:22 <list⦠2015 <list ⦠1.05e7 http⦠Let ā¦
#> 9 350314507 2019-01-29 15:44:08 <list⦠2017 <list ⦠1.13e7 "" Sub ā¦
#> 10 350314047 2019-01-29 15:41:35 <list⦠2017 <list ⦠1.17e7 http⦠Demo
#> # ⦠with 145 more rows, and 6 more variables: formats <list>,
#> # cover_image <chr>, resource_url <chr>, master_id <int>,
#> # master_url <chr>, rating <int>
, hoist()
:
discs %>%
hoist(basic_information,
title = "title",
year = "year",
label = list("labels", 1, "name"),
artist = list("artists", 1, "name")
)
#> # A tibble: 155 x 9
#> instance_id date_added title year label artist
#> <int> <dttm> <chr> <int> <chr> <chr>
#> 1 354823933 2019-02-16 17:48:59 Demo 2015 Tobi⦠Mollot
#> 2 354092601 2019-02-13 14:13:11 Obse⦠2013 La V⦠Una Bā¦
#> 3 354091476 2019-02-13 14:07:23 I 2017 La V⦠S.H.Iā¦
#> 4 351244906 2019-02-02 11:39:58 OĆdo⦠2017 La V⦠Rata ā¦
#> 5 351244801 2019-02-02 11:39:37 A Ca⦠2015 Kato⦠Ivy (ā¦
#> 6 351052065 2019-02-01 20:40:53 Tash⦠2019 High⦠Tashme
#> 7 350315345 2019-01-29 15:48:37 Demo 2014 Mind⦠Desgrā¦
#> 8 350315103 2019-01-29 15:47:22 Let ⦠2015 Not ⦠Phantā¦
#> 9 350314507 2019-01-29 15:44:08 Sub ⦠2017 Not ⦠Sub Sā¦
#> 10 350314047 2019-01-29 15:41:35 Demo 2017 Pres⦠Smallā¦
#> # ⦠with 145 more rows, and 3 more variables: basic_information <list>,
#> # id <int>, rating <int>
, .
:
discs %>%
hoist(basic_information, artist = "artists") %>%
select(disc_id = id, artist) %>%
unnest_longer(artist) %>%
unnest_wider(artist)
#> # A tibble: 167 x 8
#> disc_id join name anv tracks role resource_url id
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <int>
#> 1 7496378 "" Mollot "" "" "" https://api.discog⦠4.62e6
#> 2 4490852 "" Una Bèstia⦠"" "" "" https://api.discog⦠3.19e6
#> 3 9827276 "" S.H.I.T. (⦠"" "" "" https://api.discog⦠2.77e6
#> 4 9769203 "" Rata Negra "" "" "" https://api.discog⦠4.28e6
#> 5 7237138 "" Ivy (18) "" "" "" https://api.discog⦠3.60e6
#> 6 13117042 "" Tashme "" "" "" https://api.discog⦠5.21e6
#> 7 7113575 "" Desgraciad⦠"" "" "" https://api.discog⦠4.45e6
#> 8 10540713 "" Phantom He⦠"" "" "" https://api.discog⦠4.27e6
#> 9 11260950 "" Sub Space ⦠"" "" "" https://api.discog⦠5.69e6
#> 10 11726853 "" Small Man ⦠"" "" "" https://api.discog⦠6.37e6
#> # ⦠with 157 more rows
discs %>%
hoist(basic_information, format = "formats") %>%
select(disc_id = id, format) %>%
unnest_longer(format) %>%
unnest_wider(format) %>%
unnest_longer(descriptions)
#> # A tibble: 280 x 5
#> disc_id descriptions text name qty
#> <int> <chr> <chr> <chr> <chr>
#> 1 7496378 Numbered Black Cassette 1
#> 2 4490852 LP <NA> Vinyl 1
#> 3 9827276 "7\"" <NA> Vinyl 1
#> 4 9827276 45 RPM <NA> Vinyl 1
#> 5 9827276 EP <NA> Vinyl 1
#> 6 9769203 LP <NA> Vinyl 1
#> 7 9769203 Album <NA> Vinyl 1
#> 8 7237138 "7\"" <NA> Vinyl 1
#> 9 7237138 45 RPM <NA> Vinyl 1
#> 10 13117042 "7\"" <NA> Vinyl 1
#> # ⦠with 270 more rows
.
tidyverse
.
In this article, we have analyzed a family of functions unnest_*()
that are aimed at working with extracting elements from nested lists. This package contains many other useful features that simplify data conversion according to the concept of Tidy Data .