Luaskan kolom bersarang - daftar menggunakan bahasa R (paket tidyr dan fungsi keluarga yang tidak patut)

Dalam kebanyakan kasus, ketika bekerja dengan respons yang diterima dari API, atau dengan data lain yang memiliki struktur struktur pohon yang kompleks, Anda dihadapkan dengan format JSON dan XML.


Format ini memiliki banyak keuntungan: mereka menyimpan data dengan cukup padat dan menghindari duplikasi informasi yang tidak perlu.


Kerugian dari format ini adalah kompleksitas pemrosesan dan analisisnya. Data yang tidak terstruktur tidak dapat digunakan dalam perhitungan dan visualisasi tidak dapat dibangun atas dasar mereka.




Artikel ini adalah kelanjutan logis dari publikasi "Paket R rapi dan fungsi barunya pivot_longer dan pivot_wider" . Ini akan membantu Anda membawa konstruksi data yang tidak terstruktur menjadi familiar, dan cocok untuk analisis, bentuk tabel dengan menggunakan paket yang tidyrtermasuk dalam inti perpustakaan tidyversedan fungsi keluarganya unnest_*().


Kandungan


Jika Anda tertarik pada analisis data, Anda mungkin tertarik pada saluran telegram dan youtube saya . Sebagian besar konten didedikasikan untuk R.


  1. pengantar
  2. Pengguna GitHub
  3. Repositori Github
  4. Karakter Game Of Thrones
  5. Google


Rectangling (. , , .) ā€” , . tidyr , - , :


  • unnest_longer() - .
  • unnest_wider() - .
  • unnest_auto()
    unnest_longer() unnest_wider().
  • hoist() unnest_wider() .

, dplyr.


, repurrrsive, , , -API.


library(tidyr)
library(dplyr)
library(repurrrsive)

GitHub


gh_users, , GitHub. gh_users tibble .:


users <-   tibble( user = gh_users ) 

: gh_users, ? : , .


users , .


names(users$user[[1]])
#>  [1] "login"               "id"                  "avatar_url"         
#>  [4] "gravatar_id"         "url"                 "html_url"           
#>  [7] "followers_url"       "following_url"       "gists_url"          
#> [10] "starred_url"         "subscriptions_url"   "organizations_url"  
#> [13] "repos_url"           "events_url"          "received_events_url"
#> [16] "type"                "site_admin"          "name"               
#> [19] "company"             "blog"                "location"           
#> [22] "email"               "hireable"            "bio"                
#> [25] "public_repos"        "public_gists"        "followers"          
#> [28] "following"           "created_at"          "updated_at"

. unnest_wider() :


users %>% unnest_wider(user)
#> # A tibble: 6 x 30
#>   login     id avatar_url gravatar_id url   html_url followers_url
#>   <chr>  <int> <chr>      <chr>       <chr> <chr>    <chr>        
#> 1 gaboā€¦ 6.60e5 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> 2 jennā€¦ 5.99e5 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> 3 jtleā€¦ 1.57e6 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> 4 juliā€¦ 1.25e7 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> 5 leepā€¦ 3.51e6 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> 6 masaā€¦ 8.36e6 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> # ā€¦ with 23 more variables: following_url <chr>, gists_url <chr>,
#> #   starred_url <chr>, subscriptions_url <chr>, organizations_url <chr>,
#> #   repos_url <chr>, events_url <chr>, received_events_url <chr>,
#> #   type <chr>, site_admin <lgl>, name <chr>, company <chr>, blog <chr>,
#> #   location <chr>, email <chr>, public_repos <int>, public_gists <int>,
#> #   followers <int>, following <int>, created_at <chr>, updated_at <chr>,
#> #   bio <chr>, hireable <lgl>

30 , , unnest_wider() hoist(). hoist() , , purrr::pluck():


users %>% hoist(user, 
  followers = "followers", 
  login = "login", 
  url = "html_url"
)
#> # A tibble: 6 x 4
#>   followers login       url                            user             
#>       <int> <chr>       <chr>                          <list>           
#> 1       303 gaborcsardi https://github.com/gaborcsardi <named list [27]>
#> 2       780 jennybc     https://github.com/jennybc     <named list [27]>
#> 3      3958 jtleek      https://github.com/jtleek      <named list [27]>
#> 4       115 juliasilge  https://github.com/juliasilge  <named list [27]>
#> 5       213 leeper      https://github.com/leeper      <named list [27]>
#> 6        34 masalmon    https://github.com/masalmon    <named list [27]>

hoist() - user, hoist() .


Github


gh_repos , tibble:


repos <- tibble(repo = gh_repos)
repos
#> # A tibble: 6 x 1
#>   repo       
#>   <list>     
#> 1 <list [30]>
#> 2 <list [30]>
#> 3 <list [30]>
#> 4 <list [26]>
#> 5 <list [30]>
#> 6 <list [30]>

user , . , (. tidy data) , unnest_longer() unnest_wider():


repos <- repos %>% unnest_longer(repo)
repos
#> # A tibble: 176 x 1
#>    repo             
#>    <list>           
#>  1 <named list [68]>
#>  2 <named list [68]>
#>  3 <named list [68]>
#>  4 <named list [68]>
#>  5 <named list [68]>
#>  6 <named list [68]>
#>  7 <named list [68]>
#>  8 <named list [68]>
#>  9 <named list [68]>
#> 10 <named list [68]>
#> # ā€¦ with 166 more rows

unnest_wider() hoist() :


repos %>% hoist(repo, 
  login = c("owner", "login"), 
  name = "name",
  homepage = "homepage",
  watchers = "watchers_count"
)
#> # A tibble: 176 x 5
#>    login       name        homepage watchers repo             
#>    <chr>       <chr>       <chr>       <int> <list>           
#>  1 gaborcsardi after       <NA>            5 <named list [65]>
#>  2 gaborcsardi argufy      <NA>           19 <named list [65]>
#>  3 gaborcsardi ask         <NA>            5 <named list [65]>
#>  4 gaborcsardi baseimports <NA>            0 <named list [65]>
#>  5 gaborcsardi citest      <NA>            0 <named list [65]>
#>  6 gaborcsardi clisymbols  ""             18 <named list [65]>
#>  7 gaborcsardi cmaker      <NA>            0 <named list [65]>
#>  8 gaborcsardi cmark       <NA>            0 <named list [65]>
#>  9 gaborcsardi conditions  <NA>            0 <named list [65]>
#> 10 gaborcsardi crayon      <NA>           52 <named list [65]>
#> # ā€¦ with 166 more rows

c("owner", "login"): owner. , owner unnest_wider() :


repos %>% 
  hoist(repo, owner = "owner") %>% 
  unnest_wider(owner)
#> # A tibble: 176 x 18
#>    login     id avatar_url gravatar_id url   html_url followers_url
#>    <chr>  <int> <chr>      <chr>       <chr> <chr>    <chr>        
#>  1 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  2 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  3 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  4 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  5 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  6 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  7 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  8 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#>  9 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> 10 gaboā€¦ 660288 https://aā€¦ ""          httpā€¦ https:/ā€¦ https://api.ā€¦
#> # ā€¦ with 166 more rows, and 11 more variables: following_url <chr>,
#> #   gists_url <chr>, starred_url <chr>, subscriptions_url <chr>,
#> #   organizations_url <chr>, repos_url <chr>, events_url <chr>,
#> #   received_events_url <chr>, type <chr>, site_admin <lgl>, repo <list>

, unnest_longer() unnest_wider() unnest_auto(). , .


tibble(repo = gh_repos) %>% 
  unnest_auto(repo) %>% 
  unnest_auto(repo)
#> Using `unnest_longer(repo)`; no element has names
#> Using `unnest_wider(repo)`; elements have 68 names in common
#> # A tibble: 176 x 67
#>        id name  full_name owner private html_url description fork  url  
#>     <int> <chr> <chr>     <lis> <lgl>   <chr>    <chr>       <lgl> <chr>
#>  1 6.12e7 after gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ Run Code iā€¦ FALSE httpā€¦
#>  2 4.05e7 arguā€¦ gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ Declarativā€¦ FALSE httpā€¦
#>  3 3.64e7 ask   gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ Friendly Cā€¦ FALSE httpā€¦
#>  4 3.49e7 baseā€¦ gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ Do we get ā€¦ FALSE httpā€¦
#>  5 6.16e7 citeā€¦ gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ Test R pacā€¦ TRUE  httpā€¦
#>  6 3.39e7 clisā€¦ gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ Unicode syā€¦ FALSE httpā€¦
#>  7 3.72e7 cmakā€¦ gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ port of cmā€¦ TRUE  httpā€¦
#>  8 6.80e7 cmark gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ CommonMarkā€¦ TRUE  httpā€¦
#>  9 6.32e7 condā€¦ gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ <NA>        TRUE  httpā€¦
#> 10 2.43e7 crayā€¦ gaborcsaā€¦ <namā€¦ FALSE   https:/ā€¦ R package ā€¦ FALSE httpā€¦
#> # ā€¦ with 166 more rows, and 58 more variables: forks_url <chr>,
#> #   keys_url <chr>, collaborators_url <chr>, teams_url <chr>,
#> #   hooks_url <chr>, issue_events_url <chr>, events_url <chr>,
#> #   assignees_url <chr>, branches_url <chr>, tags_url <chr>,
#> #   blobs_url <chr>, git_tags_url <chr>, git_refs_url <chr>,
#> #   trees_url <chr>, statuses_url <chr>, languages_url <chr>,
#> #   stargazers_url <chr>, contributors_url <chr>, subscribers_url <chr>,
#> #   subscription_url <chr>, commits_url <chr>, git_commits_url <chr>,
#> #   comments_url <chr>, issue_comment_url <chr>, contents_url <chr>,
#> #   compare_url <chr>, merges_url <chr>, archive_url <chr>,
#> #   downloads_url <chr>, issues_url <chr>, pulls_url <chr>,
#> #   milestones_url <chr>, notifications_url <chr>, labels_url <chr>,
#> #   releases_url <chr>, deployments_url <chr>, created_at <chr>,
#> #   updated_at <chr>, pushed_at <chr>, git_url <chr>, ssh_url <chr>,
#> #   clone_url <chr>, svn_url <chr>, size <int>, stargazers_count <int>,
#> #   watchers_count <int>, language <chr>, has_issues <lgl>,
#> #   has_downloads <lgl>, has_wiki <lgl>, has_pages <lgl>,
#> #   forks_count <int>, open_issues_count <int>, forks <int>,
#> #   open_issues <int>, watchers <int>, default_branch <chr>,
#> #   homepage <chr>


got_chars gh_users: , . got_chars , , :


chars <- tibble(char = got_chars)
chars
#> # A tibble: 30 x 1
#>    char             
#>    <list>           
#>  1 <named list [18]>
#>  2 <named list [18]>
#>  3 <named list [18]>
#>  4 <named list [18]>
#>  5 <named list [18]>
#>  6 <named list [18]>
#>  7 <named list [18]>
#>  8 <named list [18]>
#>  9 <named list [18]>
#> 10 <named list [18]>
#> # ā€¦ with 20 more rows

chars2 <- chars %>% unnest_wider(char)
chars2
#> # A tibble: 30 x 18
#>    url      id name  gender culture born  died  alive titles aliases father
#>    <chr> <int> <chr> <chr>  <chr>   <chr> <chr> <lgl> <list> <list>  <chr> 
#>  1 httpā€¦  1022 Theoā€¦ Male   Ironboā€¦ In 2ā€¦ ""    TRUE  <chr ā€¦ <chr [ā€¦ ""    
#>  2 httpā€¦  1052 Tyriā€¦ Male   ""      In 2ā€¦ ""    TRUE  <chr ā€¦ <chr [ā€¦ ""    
#>  3 httpā€¦  1074 Victā€¦ Male   Ironboā€¦ In 2ā€¦ ""    TRUE  <chr ā€¦ <chr [ā€¦ ""    
#>  4 httpā€¦  1109 Will  Male   ""      ""    In 2ā€¦ FALSE <chr ā€¦ <chr [ā€¦ ""    
#>  5 httpā€¦  1166 Areoā€¦ Male   Norvosā€¦ In 2ā€¦ ""    TRUE  <chr ā€¦ <chr [ā€¦ ""    
#>  6 httpā€¦  1267 Chett Male   ""      At Hā€¦ In 2ā€¦ FALSE <chr ā€¦ <chr [ā€¦ ""    
#>  7 httpā€¦  1295 Cresā€¦ Male   ""      In 2ā€¦ In 2ā€¦ FALSE <chr ā€¦ <chr [ā€¦ ""    
#>  8 httpā€¦   130 Ariaā€¦ Female Dornish In 2ā€¦ ""    TRUE  <chr ā€¦ <chr [ā€¦ ""    
#>  9 httpā€¦  1303 Daenā€¦ Female Valyriā€¦ In 2ā€¦ ""    TRUE  <chr ā€¦ <chr [ā€¦ ""    
#> 10 httpā€¦  1319 Davoā€¦ Male   Westerā€¦ In 2ā€¦ ""    TRUE  <chr ā€¦ <chr [ā€¦ ""    
#> # ā€¦ with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> #   allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> #   playedBy <list>

got_chars , gh_users, .. char , ā€” :


chars2 %>% select_if(is.list)
#> # A tibble: 30 x 7
#>    titles    aliases    allegiances books     povBooks  tvSeries  playedBy 
#>    <list>    <list>     <list>      <list>    <list>    <list>    <list>   
#>  1 <chr [3]> <chr [4]>  <chr [1]>   <chr [3]> <chr [2]> <chr [6]> <chr [1]>
#>  2 <chr [2]> <chr [11]> <chr [1]>   <chr [2]> <chr [4]> <chr [6]> <chr [1]>
#>  3 <chr [2]> <chr [1]>  <chr [1]>   <chr [3]> <chr [2]> <chr [1]> <chr [1]>
#>  4 <chr [1]> <chr [1]>  <???>       <chr [1]> <chr [1]> <chr [1]> <chr [1]>
#>  5 <chr [1]> <chr [1]>  <chr [1]>   <chr [3]> <chr [2]> <chr [2]> <chr [1]>
#>  6 <chr [1]> <chr [1]>  <???>       <chr [2]> <chr [1]> <chr [1]> <chr [1]>
#>  7 <chr [1]> <chr [1]>  <???>       <chr [2]> <chr [1]> <chr [1]> <chr [1]>
#>  8 <chr [1]> <chr [1]>  <chr [1]>   <chr [4]> <chr [1]> <chr [1]> <chr [1]>
#>  9 <chr [5]> <chr [11]> <chr [1]>   <chr [1]> <chr [4]> <chr [6]> <chr [1]>
#> 10 <chr [4]> <chr [5]>  <chr [2]>   <chr [1]> <chr [3]> <chr [5]> <chr [1]>
#> # ā€¦ with 20 more rows

. , , :


chars2 %>% 
  select(name, books, tvSeries) %>% 
  pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>% 
  unnest_longer(value)
#> # A tibble: 180 x 3
#>    name             media    value            
#>    <chr>            <chr>    <chr>            
#>  1 Theon Greyjoy    books    A Game of Thrones
#>  2 Theon Greyjoy    books    A Storm of Swords
#>  3 Theon Greyjoy    books    A Feast for Crows
#>  4 Theon Greyjoy    tvSeries Season 1         
#>  5 Theon Greyjoy    tvSeries Season 2         
#>  6 Theon Greyjoy    tvSeries Season 3         
#>  7 Theon Greyjoy    tvSeries Season 4         
#>  8 Theon Greyjoy    tvSeries Season 5         
#>  9 Theon Greyjoy    tvSeries Season 6         
#> 10 Tyrion Lannister books    A Feast for Crows
#> # ā€¦ with 170 more rows

, , , :


chars2 %>% 
  select(name, title = titles) %>% 
  unnest_longer(title)
#> # A tibble: 60 x 2
#>    name              title                                               
#>    <chr>             <chr>                                               
#>  1 Theon Greyjoy     Prince of Winterfell                                
#>  2 Theon Greyjoy     Captain of Sea Bitch                                
#>  3 Theon Greyjoy     Lord of the Iron Islands (by law of the green lands)
#>  4 Tyrion Lannister  Acting Hand of the King (former)                    
#>  5 Tyrion Lannister  Master of Coin (former)                             
#>  6 Victarion Greyjoy Lord Captain of the Iron Fleet                      
#>  7 Victarion Greyjoy Master of the Iron Victory                          
#>  8 Will              ""                                                  
#>  9 Areo Hotah        Captain of the Guard at Sunspear                    
#> 10 Chett             ""                                                  
#> # ā€¦ with 50 more rows

( , "" title, got_chars: title 0, 1, .)


unnest_auto(). , unnest_auto() . , unnest_auto() , - unnest_longer(), unnest_wider(), .


tibble(char = got_chars) %>% 
  unnest_auto(char) %>% 
  select(name, title = titles) %>% 
  unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title)`; no element has names
#> # A tibble: 60 x 2
#>    name              title                                               
#>    <chr>             <chr>                                               
#>  1 Theon Greyjoy     Prince of Winterfell                                
#>  2 Theon Greyjoy     Captain of Sea Bitch                                
#>  3 Theon Greyjoy     Lord of the Iron Islands (by law of the green lands)
#>  4 Tyrion Lannister  Acting Hand of the King (former)                    
#>  5 Tyrion Lannister  Master of Coin (former)                             
#>  6 Victarion Greyjoy Lord Captain of the Iron Fleet                      
#>  7 Victarion Greyjoy Master of the Iron Victory                          
#>  8 Will              ""                                                  
#>  9 Areo Hotah        Captain of the Guard at Sunspear                    
#> 10 Chett             ""                                                  
#> # ā€¦ with 50 more rows

Google


, Google. API Google maps, API. API Google ; API Google Maps, .


has_key <- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
if (!has_key) {
  message("No Google Maps API key found; code chunks will not be run")
}

# https://developers.google.com/maps/documentation/geocoding
geocode <- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
  url <- "https://maps.googleapis.com/maps/api/geocode/json"
  url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)

  jsonlite::read_json(url)
}

, , :


houston <- geocode("Houston TX")
str(houston)
#> List of 2
#>  $ results:List of 1
#>   ..$ :List of 5
#>   .. ..$ address_components:List of 4
#>   .. .. ..$ :List of 3
#>   .. .. .. ..$ long_name : chr "Houston"
#>   .. .. .. ..$ short_name: chr "Houston"
#>   .. .. .. ..$ types     :List of 2
#>   .. .. .. .. ..$ : chr "locality"
#>   .. .. .. .. ..$ : chr "political"
#>   .. .. ..$ :List of 3
#>   .. .. .. ..$ long_name : chr "Harris County"
#>   .. .. .. ..$ short_name: chr "Harris County"
#>   .. .. .. ..$ types     :List of 2
#>   .. .. .. .. ..$ : chr "administrative_area_level_2"
#>   .. .. .. .. ..$ : chr "political"
#>   .. .. ..$ :List of 3
#>   .. .. .. ..$ long_name : chr "Texas"
#>   .. .. .. ..$ short_name: chr "TX"
#>   .. .. .. ..$ types     :List of 2
#>   .. .. .. .. ..$ : chr "administrative_area_level_1"
#>   .. .. .. .. ..$ : chr "political"
#>   .. .. ..$ :List of 3
#>   .. .. .. ..$ long_name : chr "United States"
#>   .. .. .. ..$ short_name: chr "US"
#>   .. .. .. ..$ types     :List of 2
#>   .. .. .. .. ..$ : chr "country"
#>   .. .. .. .. ..$ : chr "political"
#>   .. ..$ formatted_address : chr "Houston, TX, USA"
#>   .. ..$ geometry          :List of 4
#>   .. .. ..$ bounds       :List of 2
#>   .. .. .. ..$ northeast:List of 2
#>   .. .. .. .. ..$ lat: num 30.1
#>   .. .. .. .. ..$ lng: num -95
#>   .. .. .. ..$ southwest:List of 2
#>   .. .. .. .. ..$ lat: num 29.5
#>   .. .. .. .. ..$ lng: num -95.8
#>   .. .. ..$ location     :List of 2
#>   .. .. .. ..$ lat: num 29.8
#>   .. .. .. ..$ lng: num -95.4
#>   .. .. ..$ location_type: chr "APPROXIMATE"
#>   .. .. ..$ viewport     :List of 2
#>   .. .. .. ..$ northeast:List of 2
#>   .. .. .. .. ..$ lat: num 30.1
#>   .. .. .. .. ..$ lng: num -95
#>   .. .. .. ..$ southwest:List of 2
#>   .. .. .. .. ..$ lat: num 29.5
#>   .. .. .. .. ..$ lng: num -95.8
#>   .. ..$ place_id          : chr "ChIJAYWNSLS4QIYROwVl894CDco"
#>   .. ..$ types             :List of 2
#>   .. .. ..$ : chr "locality"
#>   .. .. ..$ : chr "political"
#>  $ status : chr "OK"

, tidyr. , :


  city <-   c ( "Houston" , "LA" , "New York" , "Chicago" , "Springfield" )  city_geo <-   purrr::map (city, geocode) 

tibble, .


loc <- tibble(city = city, json = city_geo)
loc
#> # A tibble: 5 x 2
#>   city        json            
#>   <chr>       <list>          
#> 1 Houston     <named list [2]>
#> 2 LA          <named list [2]>
#> 3 New York    <named list [2]>
#> 4 Chicago     <named list [2]>
#> 5 Springfield <named list [2]>

status result, unnest_wider() :


loc %>%
  unnest_wider(json)
#> # A tibble: 5 x 3
#>   city        results    status
#>   <chr>       <list>     <chr> 
#> 1 Houston     <list [1]> OK    
#> 2 LA          <list [1]> OK    
#> 3 New York    <list [1]> OK    
#> 4 Chicago     <list [1]> OK    
#> 5 Springfield <list [1]> OK

, results . 1 ( , API ), . unnest_longer() :


loc %>%
  unnest_wider(json) %>% 
  unnest_longer(results)
#> # A tibble: 5 x 3
#>   city        results          status
#>   <chr>       <list>           <chr> 
#> 1 Houston     <named list [5]> OK    
#> 2 LA          <named list [5]> OK    
#> 3 New York    <named list [5]> OK    
#> 4 Chicago     <named list [5]> OK    
#> 5 Springfield <named list [5]> OK

, unnest_wider():


loc %>%
  unnest_wider(json) %>% 
  unnest_longer(results) %>% 
  unnest_wider(results)
#> # A tibble: 5 x 7
#>   city   address_componenā€¦ formatted_addreā€¦ geometry place_id  types status
#>   <chr>  <list>            <chr>            <list>   <chr>     <lis> <chr> 
#> 1 Houstā€¦ <list [4]>        Houston, TX, USA <named ā€¦ ChIJAYWNā€¦ <lisā€¦ OK    
#> 2 LA     <list [4]>        Los Angeles, CAā€¦ <named ā€¦ ChIJE9onā€¦ <lisā€¦ OK    
#> 3 New Yā€¦ <list [3]>        New York, NY, Uā€¦ <named ā€¦ ChIJOwg_ā€¦ <lisā€¦ OK    
#> 4 Chicaā€¦ <list [4]>        Chicago, IL, USA <named ā€¦ ChIJ7cv0ā€¦ <lisā€¦ OK    
#> 5 Sprinā€¦ <list [5]>        Springfield, MOā€¦ <named ā€¦ ChIJP5jIā€¦ <lisā€¦ OK

geometry:


loc %>%
  unnest_wider(json) %>% 
  unnest_longer(results) %>% 
  unnest_wider(results) %>% 
  unnest_wider(geometry)
#> # A tibble: 5 x 10
#>   city  address_componeā€¦ formatted_addreā€¦ bounds location location_type
#>   <chr> <list>           <chr>            <list> <list>   <chr>        
#> 1 Housā€¦ <list [4]>       Houston, TX, USA <nameā€¦ <named ā€¦ APPROXIMATE  
#> 2 LA    <list [4]>       Los Angeles, CAā€¦ <nameā€¦ <named ā€¦ APPROXIMATE  
#> 3 New ā€¦ <list [3]>       New York, NY, Uā€¦ <nameā€¦ <named ā€¦ APPROXIMATE  
#> 4 Chicā€¦ <list [4]>       Chicago, IL, USA <nameā€¦ <named ā€¦ APPROXIMATE  
#> 5 Spriā€¦ <list [5]>       Springfield, MOā€¦ <nameā€¦ <named ā€¦ APPROXIMATE  
#> # ā€¦ with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> #   status <chr>

, location:


loc %>%
  unnest_wider(json) %>%
  unnest_longer(results) %>%
  unnest_wider(results) %>%
  unnest_wider(geometry) %>%
  unnest_wider(location)
#> # A tibble: 5 x 11
#>   city  address_componeā€¦ formatted_addreā€¦ bounds   lat    lng location_type
#>   <chr> <list>           <chr>            <list> <dbl>  <dbl> <chr>        
#> 1 Housā€¦ <list [4]>       Houston, TX, USA <nameā€¦  29.8  -95.4 APPROXIMATE  
#> 2 LA    <list [4]>       Los Angeles, CAā€¦ <nameā€¦  34.1 -118.  APPROXIMATE  
#> 3 New ā€¦ <list [3]>       New York, NY, Uā€¦ <nameā€¦  40.7  -74.0 APPROXIMATE  
#> 4 Chicā€¦ <list [4]>       Chicago, IL, USA <nameā€¦  41.9  -87.6 APPROXIMATE  
#> 5 Spriā€¦ <list [5]>       Springfield, MOā€¦ <nameā€¦  37.2  -93.3 APPROXIMATE  
#> # ā€¦ with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> #   status <chr>

, unnest_auto() , :


loc %>%
  unnest_auto(json) %>%
  unnest_auto(results) %>%
  unnest_auto(results) %>%
  unnest_auto(geometry) %>%
  unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 4 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 x 11
#>   city  address_componeā€¦ formatted_addreā€¦ bounds   lat    lng location_type
#>   <chr> <list>           <chr>            <list> <dbl>  <dbl> <chr>        
#> 1 Housā€¦ <list [4]>       Houston, TX, USA <nameā€¦  29.8  -95.4 APPROXIMATE  
#> 2 LA    <list [4]>       Los Angeles, CAā€¦ <nameā€¦  34.1 -118.  APPROXIMATE  
#> 3 New ā€¦ <list [3]>       New York, NY, Uā€¦ <nameā€¦  40.7  -74.0 APPROXIMATE  
#> 4 Chicā€¦ <list [4]>       Chicago, IL, USA <nameā€¦  41.9  -87.6 APPROXIMATE  
#> 5 Spriā€¦ <list [5]>       Springfield, MOā€¦ <nameā€¦  37.2  -93.3 APPROXIMATE  
#> # ā€¦ with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> #   status <chr>

:


loc %>%
  unnest_wider(json) %>%
  hoist(results, first_result = 1) %>%
  unnest_wider(first_result) %>%
  unnest_wider(geometry) %>%
  unnest_wider(location)
#> # A tibble: 5 x 11
#>   city  address_componeā€¦ formatted_addreā€¦ bounds   lat    lng location_type
#>   <chr> <list>           <chr>            <list> <dbl>  <dbl> <chr>        
#> 1 Housā€¦ <list [4]>       Houston, TX, USA <nameā€¦  29.8  -95.4 APPROXIMATE  
#> 2 LA    <list [4]>       Los Angeles, CAā€¦ <nameā€¦  34.1 -118.  APPROXIMATE  
#> 3 New ā€¦ <list [3]>       New York, NY, Uā€¦ <nameā€¦  40.7  -74.0 APPROXIMATE  
#> 4 Chicā€¦ <list [4]>       Chicago, IL, USA <nameā€¦  41.9  -87.6 APPROXIMATE  
#> 5 Spriā€¦ <list [5]>       Springfield, MOā€¦ <nameā€¦  37.2  -93.3 APPROXIMATE  
#> # ā€¦ with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> #   status <chr>

hoist() , lat lng.


loc %>%
  hoist(json,
    lat = list("results", 1, "geometry", "location", "lat"),
    lng = list("results", 1, "geometry", "location", "lng")
  )
#> # A tibble: 5 x 4
#>   city          lat    lng json            
#>   <chr>       <dbl>  <dbl> <list>          
#> 1 Houston      29.8  -95.4 <named list [2]>
#> 2 LA           34.1 -118.  <named list [2]>
#> 3 New York     40.7  -74.0 <named list [2]>
#> 4 Chicago      41.9  -87.6 <named list [2]>
#> 5 Springfield  37.2  -93.3 <named list [2]>


ā€” . , , , . date_added R.


discs <- tibble(disc = discog) %>% 
  unnest_wider(disc) %>% 
  mutate(date_added = as.POSIXct(strptime(date_added, "%Y-%m-%dT%H:%M:%S"))) 
discs
#> # A tibble: 155 x 5
#>    instance_id date_added          basic_information       id rating
#>          <int> <dttm>              <list>               <int>  <int>
#>  1   354823933 2019-02-16 17:48:59 <named list [11]>  7496378      0
#>  2   354092601 2019-02-13 14:13:11 <named list [11]>  4490852      0
#>  3   354091476 2019-02-13 14:07:23 <named list [11]>  9827276      0
#>  4   351244906 2019-02-02 11:39:58 <named list [11]>  9769203      0
#>  5   351244801 2019-02-02 11:39:37 <named list [11]>  7237138      0
#>  6   351052065 2019-02-01 20:40:53 <named list [11]> 13117042      0
#>  7   350315345 2019-01-29 15:48:37 <named list [11]>  7113575      0
#>  8   350315103 2019-01-29 15:47:22 <named list [11]> 10540713      0
#>  9   350314507 2019-01-29 15:44:08 <named list [11]> 11260950      0
#> 10   350314047 2019-01-29 15:41:35 <named list [11]> 11726853      0
#> # ā€¦ with 145 more rows

, , . basic_information:


discs %>% unnest_wider(basic_information)
#> Column name `id` must not be duplicated.
#> Use .name_repair to specify repair.

, .. basic_information basic_information. , , names_repair = "unique":


discs %>% unnest_wider(basic_information, names_repair = "unique")
#> New names:
#> * id -> id...6
#> * id -> id...14
#> # A tibble: 155 x 15
#>    instance_id date_added          labels  year artists id...6 thumb title
#>          <int> <dttm>              <list> <int> <list>   <int> <chr> <chr>
#>  1   354823933 2019-02-16 17:48:59 <listā€¦  2015 <list ā€¦ 7.50e6 httpā€¦ Demo 
#>  2   354092601 2019-02-13 14:13:11 <listā€¦  2013 <list ā€¦ 4.49e6 httpā€¦ Obseā€¦
#>  3   354091476 2019-02-13 14:07:23 <listā€¦  2017 <list ā€¦ 9.83e6 httpā€¦ I    
#>  4   351244906 2019-02-02 11:39:58 <listā€¦  2017 <list ā€¦ 9.77e6 httpā€¦ OĆ­doā€¦
#>  5   351244801 2019-02-02 11:39:37 <listā€¦  2015 <list ā€¦ 7.24e6 httpā€¦ A Caā€¦
#>  6   351052065 2019-02-01 20:40:53 <listā€¦  2019 <list ā€¦ 1.31e7 httpā€¦ Tashā€¦
#>  7   350315345 2019-01-29 15:48:37 <listā€¦  2014 <list ā€¦ 7.11e6 httpā€¦ Demo 
#>  8   350315103 2019-01-29 15:47:22 <listā€¦  2015 <list ā€¦ 1.05e7 httpā€¦ Let ā€¦
#>  9   350314507 2019-01-29 15:44:08 <listā€¦  2017 <list ā€¦ 1.13e7 ""    Sub ā€¦
#> 10   350314047 2019-01-29 15:41:35 <listā€¦  2017 <list ā€¦ 1.17e7 httpā€¦ Demo 
#> # ā€¦ with 145 more rows, and 7 more variables: formats <list>,
#> #   cover_image <chr>, resource_url <chr>, master_id <int>,
#> #   master_url <chr>, id...14 <int>, rating <int>

, basic_information id , :


discs %>% 
  select(-id) %>% 
  unnest_wider(basic_information)
#> # A tibble: 155 x 14
#>    instance_id date_added          labels  year artists     id thumb title
#>          <int> <dttm>              <list> <int> <list>   <int> <chr> <chr>
#>  1   354823933 2019-02-16 17:48:59 <listā€¦  2015 <list ā€¦ 7.50e6 httpā€¦ Demo 
#>  2   354092601 2019-02-13 14:13:11 <listā€¦  2013 <list ā€¦ 4.49e6 httpā€¦ Obseā€¦
#>  3   354091476 2019-02-13 14:07:23 <listā€¦  2017 <list ā€¦ 9.83e6 httpā€¦ I    
#>  4   351244906 2019-02-02 11:39:58 <listā€¦  2017 <list ā€¦ 9.77e6 httpā€¦ OĆ­doā€¦
#>  5   351244801 2019-02-02 11:39:37 <listā€¦  2015 <list ā€¦ 7.24e6 httpā€¦ A Caā€¦
#>  6   351052065 2019-02-01 20:40:53 <listā€¦  2019 <list ā€¦ 1.31e7 httpā€¦ Tashā€¦
#>  7   350315345 2019-01-29 15:48:37 <listā€¦  2014 <list ā€¦ 7.11e6 httpā€¦ Demo 
#>  8   350315103 2019-01-29 15:47:22 <listā€¦  2015 <list ā€¦ 1.05e7 httpā€¦ Let ā€¦
#>  9   350314507 2019-01-29 15:44:08 <listā€¦  2017 <list ā€¦ 1.13e7 ""    Sub ā€¦
#> 10   350314047 2019-01-29 15:41:35 <listā€¦  2017 <list ā€¦ 1.17e7 httpā€¦ Demo 
#> # ā€¦ with 145 more rows, and 6 more variables: formats <list>,
#> #   cover_image <chr>, resource_url <chr>, master_id <int>,
#> #   master_url <chr>, rating <int>

, hoist():


discs %>% 
  hoist(basic_information,
    title = "title",
    year = "year",
    label = list("labels", 1, "name"),
    artist = list("artists", 1, "name")
  )
#> # A tibble: 155 x 9
#>    instance_id date_added          title  year label artist
#>          <int> <dttm>              <chr> <int> <chr> <chr> 
#>  1   354823933 2019-02-16 17:48:59 Demo   2015 Tobiā€¦ Mollot
#>  2   354092601 2019-02-13 14:13:11 Obseā€¦  2013 La Vā€¦ Una Bā€¦
#>  3   354091476 2019-02-13 14:07:23 I      2017 La Vā€¦ S.H.Iā€¦
#>  4   351244906 2019-02-02 11:39:58 OĆ­doā€¦  2017 La Vā€¦ Rata ā€¦
#>  5   351244801 2019-02-02 11:39:37 A Caā€¦  2015 Katoā€¦ Ivy (ā€¦
#>  6   351052065 2019-02-01 20:40:53 Tashā€¦  2019 Highā€¦ Tashme
#>  7   350315345 2019-01-29 15:48:37 Demo   2014 Mindā€¦ Desgrā€¦
#>  8   350315103 2019-01-29 15:47:22 Let ā€¦  2015 Not ā€¦ Phantā€¦
#>  9   350314507 2019-01-29 15:44:08 Sub ā€¦  2017 Not ā€¦ Sub Sā€¦
#> 10   350314047 2019-01-29 15:41:35 Demo   2017 Presā€¦ Smallā€¦
#> # ā€¦ with 145 more rows, and 3 more variables: basic_information <list>,
#> #   id <int>, rating <int>

, .


:


discs %>% 
  hoist(basic_information, artist = "artists") %>% 
  select(disc_id = id, artist) %>% 
  unnest_longer(artist) %>% 
  unnest_wider(artist)
#> # A tibble: 167 x 8
#>     disc_id join  name        anv   tracks role  resource_url            id
#>       <int> <chr> <chr>       <chr> <chr>  <chr> <chr>                <int>
#>  1  7496378 ""    Mollot      ""    ""     ""    https://api.discogā€¦ 4.62e6
#>  2  4490852 ""    Una BĆØstiaā€¦ ""    ""     ""    https://api.discogā€¦ 3.19e6
#>  3  9827276 ""    S.H.I.T. (ā€¦ ""    ""     ""    https://api.discogā€¦ 2.77e6
#>  4  9769203 ""    Rata Negra  ""    ""     ""    https://api.discogā€¦ 4.28e6
#>  5  7237138 ""    Ivy (18)    ""    ""     ""    https://api.discogā€¦ 3.60e6
#>  6 13117042 ""    Tashme      ""    ""     ""    https://api.discogā€¦ 5.21e6
#>  7  7113575 ""    Desgraciadā€¦ ""    ""     ""    https://api.discogā€¦ 4.45e6
#>  8 10540713 ""    Phantom Heā€¦ ""    ""     ""    https://api.discogā€¦ 4.27e6
#>  9 11260950 ""    Sub Space ā€¦ ""    ""     ""    https://api.discogā€¦ 5.69e6
#> 10 11726853 ""    Small Man ā€¦ ""    ""     ""    https://api.discogā€¦ 6.37e6
#> # ā€¦ with 157 more rows

discs %>% 
  hoist(basic_information, format = "formats") %>% 
  select(disc_id = id, format) %>% 
  unnest_longer(format) %>% 
  unnest_wider(format) %>% 
  unnest_longer(descriptions)
#> # A tibble: 280 x 5
#>     disc_id descriptions text  name     qty  
#>       <int> <chr>        <chr> <chr>    <chr>
#>  1  7496378 Numbered     Black Cassette 1    
#>  2  4490852 LP           <NA>  Vinyl    1    
#>  3  9827276 "7\""        <NA>  Vinyl    1    
#>  4  9827276 45 RPM       <NA>  Vinyl    1    
#>  5  9827276 EP           <NA>  Vinyl    1    
#>  6  9769203 LP           <NA>  Vinyl    1    
#>  7  9769203 Album        <NA>  Vinyl    1    
#>  8  7237138 "7\""        <NA>  Vinyl    1    
#>  9  7237138 45 RPM       <NA>  Vinyl    1    
#> 10 13117042 "7\""        <NA>  Vinyl    1    
#> # ā€¦ with 270 more rows

.



tidyverse .


Dalam artikel ini, kami telah menganalisis serangkaian fungsi unnest_*()yang bertujuan untuk bekerja dengan mengekstraksi elemen dari daftar bersarang. Paket ini berisi banyak fitur berguna lainnya yang menyederhanakan konversi data sesuai dengan konsep Tidy Data .


All Articles