Skip to contents

Compare categories in 'many' datacubes

Usage

compare_categories(
  datacube,
  dataset = "all",
  key = "manyID",
  variable = "all",
  category = "all"
)

Arguments

datacube

A datacube from one of the many packages.

dataset

A dataset in a datacube from one of the many packages. By default "all". That is, all datasets in the datacube are used. To select two or more datasets, please declare them as a vector.

key

A variable key to join datasets. 'manyID' by default.

variable

Would you like to focus on one, or more, specific variables present in one or more datasets in the 'many' datacube? By default "all". For multiple variables, please declare variable names as a vector.

category

Would you like to focus on one specific code category? By default "all" are returned. Other options include "confirmed", "unique", "missing", "conflict", or "majority". For multiple variables, please declare categories as a vector.

Details

Confirmed values are the same in all datasets in datacube. Unique values appear once in datasets in datacube. Missing values are missing in all datasets in datacube. Conflict values are different in the same number of datasets in datacube. Majority values have the same value in multiple, but not all, datasets in datacube.

Examples

# \donttest{
compare_categories(emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> # A tibble: 139 × 37
#>    ID        `wikipedia$Begin` `UNRV$Begin` `britannica$Begin` `Begin (3)`
#>    <chr>     <mdate>           <mdate>      <mdate>            <chr>      
#>  1 Augustus  -26-01-16         -0027        -0031              conflict   
#>  2 Tiberius  14-09-18          -0014        0014               conflict   
#>  3 Caligula  37-03-18          NA           0037               conflict   
#>  4 Claudius  41-01-25          0041         0041               majority   
#>  5 Nero      54-10-13          0054         0054               majority   
#>  6 Galba     68-06-08          0068         0068               majority   
#>  7 Otho      69-01-15          0069         0069-01            conflict   
#>  8 Vitellius 69-04-17          0069         NA                 conflict   
#>  9 Vespasian 69-12-21          0069         0069               majority   
#> 10 Titus     79-06-24          0079         0079               majority   
#> # ℹ 129 more rows
#> # ℹ 32 more variables: `wikipedia$End` <mdate>, `UNRV$End` <mdate>,
#> #   `britannica$End` <mdate>, `End (3)` <chr>, `wikipedia$FullName` <chr>,
#> #   `UNRV$FullName` <chr>, `FullName (2)` <chr>, `wikipedia$Birth` <chr>,
#> #   `UNRV$Birth` <chr>, `Birth (2)` <chr>, `wikipedia$Death` <chr>,
#> #   `UNRV$Death` <chr>, `Death (2)` <chr>, `wikipedia$CityBirth` <chr>,
#> #   `CityBirth (1)` <chr>, `wikipedia$ProvinceBirth` <chr>, …
compare_categories(datacube = emperors, dataset = c("wikipedia", "UNRV"),
key = "ID", variable = c("Beg", "End"), category = c("conflict", "unique"))
#> There were 49 matched observations by ID variable across datasets in datacube.
#> # A tibble: 119 × 4
#>    ID        `wikipedia$End` `UNRV$End` `End (2)`
#>    <chr>     <mdate>         <mdate>    <chr>    
#>  1 Augustus  14-08-19        -0014      conflict 
#>  2 Tiberius  37-03-16        0037       conflict 
#>  3 Caligula  41-01-24        NA         unique   
#>  4 Claudius  54-10-13        0054       conflict 
#>  5 Nero      68-06-09        0068       conflict 
#>  6 Galba     69-01-15        0069       conflict 
#>  7 Otho      69-04-16        0069       conflict 
#>  8 Vitellius 69-12-20        0069       conflict 
#>  9 Vespasian 79-06-24        0079       conflict 
#> 10 Titus     81-09-13        0081       conflict 
#> # ℹ 109 more rows
plot(compare_categories(emperors, key = "ID"))
#> There were 116 matched observations by ID variable across datasets in datacube.

plot(compare_categories(datacube = emperors, dataset = c("wikipedia", "UNRV"),
key = "ID", variable = c("Beg", "End"), category = c("conflict", "unique")))
#> There were 49 matched observations by ID variable across datasets in datacube.

# }