Skip to contents

This function consolidates a set of datasets in a 'many* package' datacube into a single dataset with some combination of the rows, columns, and observations of the datasets in the datacube. The function includes separate arguments for the rows and columns, as well as for how to resolve conflicts for observations across datasets. This provides users with considerable flexibility in how they combine data. For example, users may wish to stick to units that appear in every dataset but include variables coded in any dataset, or units that appear in any dataset but only those variables that appear in every dataset. Even then there may be conflicts, as the actual unit-variable observations may differ from dataset to dataset. We offer a number of resolve methods that enable users to choose how conflicts between observations are resolved.

Usage

consolidate(
  datacube,
  rows = "any",
  cols = "any",
  resolve = "coalesce",
  key = "manyID"
)

Arguments

datacube

A datacube from one of the many packages

rows

Which rows or units to retain. By default "any" (or all) units are retained, but another option is "every", which retains only those units that appear in all parent datasets.

cols

Which columns or variables to retain. By default "any" (or all) variables are retained, but another option is "every", which retains only those variables that appear in all parent datasets.

resolve

How should conflicts between observations be resolved? By default "coalesce", but other options include: "min", "max", "mean", "median", and "random". "coalesce" takes the first non-NA value. "max" takes the largest value. "min" takes the smallest value. "mean" takes the average value. "median" takes the median value. "random" takes a random value. For different variables to be resolved differently, you can specify the variables' names alongside how each is to be resolved in a list (e.g. resolve = c(var1 = "min", var2 = "max")). In this case, only the variables named will be resolved and returned.

key

An ID column to collapse by. By default "manyID". Users can also specify multiple key variables in a list. For multiple key variables, the key variables must be present in all the datasets in the datacube (e.g. key = c("key1", "key2")). For equivalent key columns with different names across datasets, matching is possible if keys are declared (e.g. key = c("key1" = "key2")). Missing observations in the key variable are removed.

Value

A single tibble/data frame.

Details

Text variables are dropped for more efficient consolidation.

Examples

# \donttest{
consolidate(datacube = emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 138 × 15
#>    ID       Begin End   FullName Birth Death CityBirth ProvinceBirth Rise  Cause
#>    <chr>    <chr> <chr> <chr>    <chr> <chr> <chr>     <chr>         <chr> <chr>
#>  1 Aemilian 253-… 253-… CAESAR … 207-… 253-… NA        Africa        Appo… Assa…
#>  2 Allectus 0293  0297  ?        ?     297   NA        NA            NA    NA   
#>  3 Anastas… 0491  0518  Flavius… 430   518   NA        NA            NA    NA   
#>  4 Anthemi… 0467  0472  Procopi… 420   472   NA        NA            NA    NA   
#>  5 Antonin… 0138  0161  Titus A… 86    161   NA        NA            NA    NA   
#>  6 Antoniu… 138-… 161-… CAESAR … 86-0… 161-… Lanuvium  Italia        Birt… Natu…
#>  7 Arcadius 0395  0408  Flavius… 377   408   NA        NA            NA    NA   
#>  8 Augustus -26-… 14-0… IMPERAT… 62-0… 14-0… Rome      Italia        Birt… Assa…
#>  9 Aulus V… 0069… 0069… NA       NA    NA    NA        NA            NA    NA   
#> 10 Aurelian 270-… 275-… CAESAR … 214-… 275-… Sirmium   Pannonia      Appo… Assa…
#> # ℹ 128 more rows
#> # ℹ 5 more variables: Killer <chr>, Dynasty <chr>, Era <chr>, Notes <chr>,
#> #   Verif <chr>
consolidate(datacube = favour(emperors, "UNRV"), rows = "every",
cols = "every", resolve = "coalesce", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin End  
#>    <chr>          <chr> <chr>
#>  1 Aemilian       0253  0253 
#>  2 Augustus       -0027 -0014
#>  3 Aurelian       0270  0275 
#>  4 Balbinus       0238  0238 
#>  5 Caracalla      0211  0217 
#>  6 Carinus        0283  0285 
#>  7 Carus          0282  0283 
#>  8 Claudius       0041  0054 
#>  9 Commodus       0180  0192 
#> 10 Constantine II 0337  0340 
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "any", cols = "every",
resolve = "min", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 138 × 3
#>    ID              Begin      End       
#>    <chr>           <date>     <date>    
#>  1 Aemilian        0253-01-01 0253-01-01
#>  2 Allectus        0293-01-01 0297-01-01
#>  3 Anastasius      0491-01-01 0518-01-01
#>  4 Anthemius       0467-01-01 0472-01-01
#>  5 Antoninus Pius  0138-01-01 0161-01-01
#>  6 Antonius Pius   0138-07-10 0161-03-07
#>  7 Arcadius        0383-01-01 0395-01-01
#>  8 Augustus        -031-01-01 -014-01-01
#>  9 Aulus Vitellius 0069-07-01 0069-12-01
#> 10 Aurelian        0270-01-01 0275-01-01
#> # ℹ 128 more rows
consolidate(datacube = emperors, rows = "every", cols = "any",
resolve = "max", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 15
#>    ID         Begin      End        FullName Birth Death CityBirth ProvinceBirth
#>    <chr>      <date>     <date>     <chr>    <chr> <chr> <chr>     <chr>        
#>  1 Aemilian   0253-12-31 0253-12-31 "Marcus… 207?  253-… NA        Africa       
#>  2 Augustus   -026-01-16 0014-12-31 "IMPERA… 63 BC 14-0… Rome      Italia       
#>  3 Aurelian   0270-12-31 0275-12-31 "Lucius… 214-… 275-… Sirmium   Pannonia     
#>  4 Balbinus   0238-12-31 0238-12-31 "Decimu… 178-… 238-… NA        Unknown      
#>  5 Caracalla  0211-12-31 0217-12-31 "born L… 188-… 217-… Lugdunum  Gallia Lugdu…
#>  6 Carinus    0283-12-31 0285-12-31 "Marcus… ?     285-… NA        Unknown      
#>  7 Carus      0282-12-31 0283-12-31 "Marcus… 230?  283-… Narbo     Gallia Narbo…
#>  8 Claudius   0041-12-31 0054-12-31 "Tiberi… 9-08… 54-1… Lugdunum  Gallia Lugdu…
#>  9 Commodus   0180-12-31 0192-12-31 "Marcus… 161-… 192-… Lanuvium  Italia       
#> 10 Constanti… 0337-12-31 0340-12-31 "Flaviu… 317   340-… Arelate   Gallia Narbo…
#> # ℹ 31 more rows
#> # ℹ 7 more variables: Rise <chr>, Cause <chr>, Killer <chr>, Dynasty <chr>,
#> #   Era <chr>, Notes <chr>, Verif <chr>
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "median", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin      End       
#>    <chr>          <date>     <date>    
#>  1 Aemilian       0253-07-02 0253-07-02
#>  2 Augustus       -027-07-02 0014-07-02
#>  3 Aurelian       0270-07-02 0275-07-02
#>  4 Balbinus       0238-04-22 0238-07-29
#>  5 Caracalla      0198-07-02 0217-07-02
#>  6 Carinus        0283-07-02 0285-07-02
#>  7 Carus          0282-07-02 0283-07-02
#>  8 Claudius       0041-07-02 0054-07-02
#>  9 Commodus       0177-07-02 0192-07-02
#> 10 Constantine II 0337-07-02 0340-07-02
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "mean", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin      End       
#>    <chr>          <date>     <date>    
#>  1 Aemilian       0253-07-16 0253-08-06
#>  2 Augustus       -028-05-07 0005-03-18
#>  3 Aurelian       0270-07-27 0275-07-27
#>  4 Balbinus       0238-05-15 0238-07-20
#>  5 Caracalla      0202-11-01 0217-06-03
#>  6 Carinus        0283-07-12 0285-07-12
#>  7 Carus          0282-08-01 0283-07-12
#>  8 Claudius       0041-05-10 0054-08-05
#>  9 Commodus       0178-07-02 0192-08-31
#> 10 Constantine II 0337-06-18 0340-05-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "random", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin      End       
#>    <chr>          <chr>      <chr>     
#>  1 Aemilian       0253-08-15 0253-02-12
#>  2 Augustus       -027-12-24 -014-06-26
#>  3 Aurelian       0270-09-23 0275-09-15
#>  4 Balbinus       0238-04-22 0238-12-05
#>  5 Caracalla      0198-08-22 0217-01-26
#>  6 Carinus        0283-04-12 0285-08-07
#>  7 Carus          0282-06-29 0283-08-01
#>  8 Claudius       0041-01-25 0054-09-25
#>  9 Commodus       0177-06-22 0192-11-25
#> 10 Constantine II 0337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = c(Begin = "min", End = "max"), key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin      End       
#>    <chr>          <date>     <date>    
#>  1 Aemilian       0253-01-01 0253-12-31
#>  2 Augustus       -031-01-01 0014-12-31
#>  3 Aurelian       0270-01-01 0275-12-31
#>  4 Balbinus       0238-01-01 0238-12-31
#>  5 Caracalla      0198-01-01 0217-12-31
#>  6 Carinus        0283-01-01 0285-12-31
#>  7 Carus          0282-01-01 0283-12-31
#>  8 Claudius       0041-01-01 0054-12-31
#>  9 Commodus       0177-01-01 0192-12-31
#> 10 Constantine II 0337-01-01 0340-12-31
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "any", cols = "any",
resolve = c(Death = "max", Cause = "coalesce"),
key = c("ID", "Begin"))
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 203 × 4
#>    ID             Begin      Cause          Death     
#>    <chr>          <mdate>    <chr>          <chr>     
#>  1 Aemilian       0253       NA             253       
#>  2 Aemilian       253-08-15~ Assassination  253-10-15~
#>  3 Allectus       0293       NA             297       
#>  4 Anastasius     0491       NA             518       
#>  5 Anthemius      0467       NA             472       
#>  6 Antoninus Pius 0138       NA             161       
#>  7 Antonius Pius  138-07-10  Natural Causes 161-03-07 
#>  8 Arcadius       0383       NA             NA        
#>  9 Arcadius       0395       NA             408       
#> 10 Augustus       -0027      NA             14        
#> # ℹ 193 more rows
# }