Appendix A — Implementation
ccn Validation
# TODO Need to handle NAs
is_valid_nchar <- function(x) {
N <- collapse::vlengths(x)
in_between(N, 6L, 14L)
}
# is_valid_nchar(examples[1:10])
all_valid_nchar <- function(x) {
collapse::allv(is_valid_nchar(x), TRUE)
}
# all_valid_nchar(examples)
which_not_valid_nchar <- function(x) {
collapse::whichv(is_valid_nchar(x), FALSE)
}
# which_not_valid_nchar(examples)
# examples[which_not_valid_nchar(examples)]
ccn Checks
# TODO Need to handle NAs
check_valid_nchar <- function(x) {
if (!all_valid_nchar(x)) {
i <- which_not_valid_nchar(x)
n <- length(i)
cli::cli_abort(
c(
"{.strong {n}} Invalid {cli::qty(n)} CCN{?s}:",
"x" = "{.val {x[i]}}"
)
)
}
}
# check_valid_nchar(examples)
check_character <- function(x) {
if (!is_character(x)) {
cli::cli_abort("`x` must be a character vector.")
}
}
ccn class
First, make a ccn class, i.e., a character vector that is printed as a ccn. Start by defining a low-level constructor to check types and/or sizes and call new_vctr().
ccn is built on a character vector of any length and doesn’t have any attributes.
<ccn[25]>
[1] 001500 001502 001503 001509 NA 001511 001513 001514 001515 001517
[11] 001518 NA 001527 001528 001529 001530 001534 NA 001536 001541
[21] 001545 NA 001552 001556 001557
str(x) ccn [1:25] 001500, 001502, 001503, 001509, NA , 001511, 001513, 001514,...
We then follow up with a user friendly helper. Here we’ll use vec_cast() to allow it to accept anything coercible to a character:
Before you go on, check that user-friendly constructor returns a zero-length vector when called with no arguments. This makes it easy to use as a prototype.
new_ccn()<ccn[0]>
ccn()<ccn[0]>
For the convenience of your users, consider implementing an is_ccn() function:
is_ccn <- function(x) {
inherits(x, "ccn")
}
format() method
The first method for every class should almost always be a format() method. This should return a character vector the same length as x. The easiest way to do this is to rely on one of R’s low-level formatting functions like formatC():
<ccn[20]>
[1] 001500 001502 001503 001509 <NA> 001511 001513 001514 001515 001517
[11] 001518 <NA> 001527 001528 001529 001530 001534 <NA> 001536 001541
Note the use of vec_data() so format() doesn’t get stuck in an infinite loop, and that I take a little care to not convert NA to "NA"; this leads to better printing.
The format method is also used by data frames, tibbles, and str():
data.frame(x) x
1 001500
2 001502
3 001503
4 001509
5 <NA>
6 001511
7 001513
8 001514
9 001515
10 001517
11 001518
12 <NA>
13 001527
14 001528
15 001529
16 001530
17 001534
18 <NA>
19 001536
20 001541
For optimal display, I recommend also defining an abbreviated type name, which should be 4-5 letters for commonly used vectors. This is used in tibbles and in str():
vec_ptype_abbr.ccn <- function(x, ...) {
"ccn"
}
tibble::tibble(x)# A tibble: 20 × 1
x
<ccn>
1 001500
2 001502
3 001503
4 001509
5 NA
6 001511
7 001513
8 001514
9 001515
10 001517
11 001518
12 NA
13 001527
14 001528
15 001529
16 001530
17 001534
18 NA
19 001536
20 001541
str(x) ccn [1:20] 001500, 001502, 001503, 001509, <NA>, 001511, 001513, 001514, 0...
If you need more control over printing in tibbles, implement a method for pillar::pillar_shaft(). See vignette("pillar", package = "vctrs") for details.
Casting and coercion
The next set of methods you are likely to need are those related to coercion and casting. Coercion and casting are two sides of the same coin: changing the prototype of an existing object. When the change happens implicitly (e.g in c()) we call it coercion; when the change happens explicitly (e.g. with as.integer(x)), we call it casting.
One of the main goals of vctrs is to put coercion and casting on a robust theoretical footing so it’s possible to make accurate predictions about what (e.g.) c(x, y) should do when x and y have different prototypes. vctrs achieves this goal through two generics:
-
vec_ptype2(x, y)defines a possible set of coercions. It returns a prototype ifxandycan be safely coerced to the same prototype; otherwise it returns an error. The set of automatic coercions is usually quite small because too many tend to make code harder to reason about and silently propagate mistakes. -
vec_cast(x, to)defines the possible sets of casts. It returnsxtranslated to have prototypeto, or throws an error if the conversion isn’t possible. The set of possible casts is a superset of possible coercions because they’re requested explicitly.
Double dispatch
Both generics use double dispatch which means that the implementation is selected based on the class of two arguments, not just one. S3 does not natively support double dispatch, so we implement our own dispatch mechanism. In practice, this means:
- You end up with method names with two classes, like
vec_ptype2.foo.bar(). - You don’t need to implement default methods (they would never be called if you do).
- You can’t call
NextMethod().
CCN class
We’ll make our ccn class coercible back and forth with character vectors.
vec_ptype2() provides a user friendly error message if the coercion doesn’t exist and makes sure NA is handled in a standard way. NA is technically a logical vector, but we want it to stand in for a missing value of any type.
vec_ptype2("bogus", ccn())Error:
! Can't combine `"bogus"` <character> and `ccn()` <ccn>.
vec_ptype2(ccn(), NA)<ccn[0]>
vec_ptype2(NA, ccn())<ccn[0]>
By default and in simple cases, an object of the same class is compatible with itself:
vec_ptype2(ccn(), ccn())<ccn[0]>
However this only works if the attributes for both objects are the same. Also the default methods are a bit slower. It is always a good idea to provide an explicit coercion method for the case of identical classes.
So we’ll start by saying that a ccn combined with a ccn yields a ccn, which we indicate by returning a prototype generated by the constructor.
vec_ptype2.ccn.ccn <- function(x, y, ...) new_ccn()Next we define methods that say that combining a ccn and character should yield a character.
Because double dispatch is a bit of a hack, we need to provide two methods. It’s your responsibility to ensure that each member of the pair returns the same result: if they don’t you will get weird and unpredictable behaviour.
The double dispatch mechanism requires us to refer to the underlying type, character, in the method name. If we implemented vec_ptype2.ccn.numeric(), it would never be called.
We can check that we’ve implemented this correctly with vec_ptype_show():
vec_ptype_show(ccn(), character(), ccn())Prototype: <character>
0. ( , <ccn> ) = <ccn>
1. ( <ccn> , <character> ) = <character>
2. ( <character> , <ccn> ) = <character>
The vec_ptype2() methods define which input is the richer type that vctrs should coerce to. However, they don’t perform any conversion.
This is the job of vec_cast(), which we implement next. We’ll provide a method to cast a ccn to a ccn:
vec_cast.ccn.ccn <- function(x, to, ...) xAnd then for converting back and forth between characters. To convert a character to a ccn we use the ccn() helper (not the constructor; this is un-validated user input). To convert a ccn to a character, we strip the attributes.
Note that for historical reasons the order of argument in the signature is the opposite as for vec_ptype2(). The class for to comes first, and the class for x comes second.
Again, the double dispatch mechanism requires us to refer to the underlying type, character, in the method name:
vec_cast.ccn.character <- function(x, to, ...) ccn(x)
vec_cast.character.ccn <- function(x, to, ...) vec_data(x)Then we can check this works with vec_cast():
Once you’ve implemented vec_ptype2() and vec_cast(), you get vec_c(), [<-, and [[<- implementations for free.
vec_c(ccn("030113"), "1")[1] "030113" "1"
vec_c(NA, ccn("030113"))<ccn[2]>
[1] <NA> 030113
# but
vec_c(TRUE, ccn("030113"))Error in `vec_c()`:
! Can't combine `..1` <logical> and `..2` <ccn>.
x <- ccn(examples[1:5])
x[1:2] <- LETTERS[1:5]
x[[3]] <- LETTERS[5]
x<ccn[5]>
[1] A B E 001509 <NA>
You’ll also get mostly correct behaviour for c(). The exception is when you use c() with a base R class:
# Correct
c(ccn("030113"), "AB")[1] "030113" "AB"
Error in `vec_c()`:
! Can't combine `..1` <ccn> and `..2` <factor<61b97>>.
[1] "1" "030113"
Unfortunately there’s no way to fix this problem with the current design of c().
Again, as a convenience, consider providing an as_ccn() function that makes use of the casts defined in your vec_cast.ccn() methods:
as_ccn <- function(x) {
vec_cast(x, new_ccn())
}Occasionally, it is useful to provide conversions that go beyond what’s allowed in casting. For example, we could offer a parsing method for character vectors.
In this case, as_ccn() should be generic, the default method should cast, and then additional methods should implement more flexible conversion: