Compare categories in 'many' datacubes
Usage
compare_categories(
datacube,
dataset = "all",
key = "manyID",
variable = "all",
category = "all"
)
Arguments
- datacube
A datacube from one of the many packages.
- dataset
A dataset in a datacube from one of the many packages. By default "all". That is, all datasets in the datacube are used. To select two or more datasets, please declare them as a vector.
- key
A variable key to join datasets. 'manyID' by default.
- variable
Would you like to focus on one, or more, specific variables present in one or more datasets in the 'many' datacube? By default "all". For multiple variables, please declare variable names as a vector.
- category
Would you like to focus on one specific code category? By default "all" are returned. Other options include "confirmed", "unique", "missing", "conflict", or "majority". For multiple variables, please declare categories as a vector.
Details
Confirmed values are the same in all datasets in datacube. Unique values appear once in datasets in datacube. Missing values are missing in all datasets in datacube. Conflict values are different in the same number of datasets in datacube. Majority values have the same value in multiple, but not all, datasets in datacube.
See also
Other compare_:
compare_dimensions()
,
compare_missing()
,
compare_overlap()
,
compare_ranges()
Examples
# \donttest{
compare_categories(emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> # A tibble: 139 × 37
#> ID `wikipedia$Begin` `UNRV$Begin` `britannica$Begin` `Begin (3)`
#> <chr> <mdate> <mdate> <mdate> <chr>
#> 1 Augustus -26-01-16 -0027 -0031 conflict
#> 2 Tiberius 14-09-18 -0014 0014 conflict
#> 3 Caligula 37-03-18 NA 0037 conflict
#> 4 Claudius 41-01-25 0041 0041 majority
#> 5 Nero 54-10-13 0054 0054 majority
#> 6 Galba 68-06-08 0068 0068 majority
#> 7 Otho 69-01-15 0069 0069-01 conflict
#> 8 Vitellius 69-04-17 0069 NA conflict
#> 9 Vespasian 69-12-21 0069 0069 majority
#> 10 Titus 79-06-24 0079 0079 majority
#> # ℹ 129 more rows
#> # ℹ 32 more variables: `wikipedia$End` <mdate>, `UNRV$End` <mdate>,
#> # `britannica$End` <mdate>, `End (3)` <chr>, `wikipedia$FullName` <chr>,
#> # `UNRV$FullName` <chr>, `FullName (2)` <chr>, `wikipedia$Birth` <chr>,
#> # `UNRV$Birth` <chr>, `Birth (2)` <chr>, `wikipedia$Death` <chr>,
#> # `UNRV$Death` <chr>, `Death (2)` <chr>, `wikipedia$CityBirth` <chr>,
#> # `CityBirth (1)` <chr>, `wikipedia$ProvinceBirth` <chr>, …
compare_categories(datacube = emperors, dataset = c("wikipedia", "UNRV"),
key = "ID", variable = c("Beg", "End"), category = c("conflict", "unique"))
#> There were 49 matched observations by ID variable across datasets in datacube.
#> # A tibble: 119 × 4
#> ID `wikipedia$End` `UNRV$End` `End (2)`
#> <chr> <mdate> <mdate> <chr>
#> 1 Augustus 14-08-19 -0014 conflict
#> 2 Tiberius 37-03-16 0037 conflict
#> 3 Caligula 41-01-24 NA unique
#> 4 Claudius 54-10-13 0054 conflict
#> 5 Nero 68-06-09 0068 conflict
#> 6 Galba 69-01-15 0069 conflict
#> 7 Otho 69-04-16 0069 conflict
#> 8 Vitellius 69-12-20 0069 conflict
#> 9 Vespasian 79-06-24 0079 conflict
#> 10 Titus 81-09-13 0081 conflict
#> # ℹ 109 more rows
plot(compare_categories(emperors, key = "ID"))
#> There were 116 matched observations by ID variable across datasets in datacube.
plot(compare_categories(datacube = emperors, dataset = c("wikipedia", "UNRV"),
key = "ID", variable = c("Beg", "End"), category = c("conflict", "unique")))
#> There were 49 matched observations by ID variable across datasets in datacube.
# }