Difference between char and character objects

The two R types char and character at the internal C side correspond to CHARSXP and STRSXP respectively. At the R level, one always deals with character objects; a single string, like:

y <- "My name is hasnain"

is actually a character object of length 1. Internally, each element of a character is a char, but R doesn't provide (AFAIK) a direct way to extract, create and/or use a char.

Although you can't create a char/CHARSXP object with pure R, it's straightforward to get it through the R/C interface using the mkChar function, which takes a standard C string and turns it into a CHARSXP. For instance, one can create a char.c file:

#include <stdio.h>
#include <stdlib.h>
#include <R.h>
#include <Rinternals.h>
SEXP returnCHAR() {
   SEXP ret = PROTECT(mkChar("Hello World!"));
   UNPROTECT(1);
   return ret;
}

After compiling it through R CMD SHLIB char.c, from the R side:

dyn.load("char.so")  #linux dll; extension varies across platforms
x<-.Call("returnCHAR")
x
# <CHARSXP: "Hello World!">
typeof(x)
#[1] "char"
length(x)
#[1] 12

Besides typeof and length I didn't find many other R functions that acts on char objects. Even as.character doesn't work! I could neither extract a char from a standard character vector, nor insert this char into an existing character vector (assignment doesn't work).

The c function coerces to a list if an object is a char:

c(1,"a",x)
#[[1]]
#[1] 1
#
#[[2]]
#[1] "a"
#
#[[3]]
#<CHARSXP: "Hello World!">

We can make use of .Internal(inspect()) (warning: inspect is an internal, not exposed function and so it might change in future releases. Don't rely on it) to have a glimpse of the internal structure of an object. As far as I know, char/CHARXSP objects are shared between string vectors to save memory. For instance:

let<-letters[1:2]
.Internal(inspect(let))
#@1aff2a8 16 STRSXP g0c2 [NAM(1)] (len=2, tl=0)
#  @1368c60 09 CHARSXP g0c1 [MARK,gp=0x61] [ASCII] [cached] "a"
#  @16dc7c0 09 CHARSXP g0c1 [MARK,gp=0x60] [ASCII] [cached] "b"
mya<-"a"
.Internal(inspect(mya))
#@3068710 16 STRSXP g0c1 [NAM(3)] (len=1, tl=0)
#  @1368c60 09 CHARSXP g0c1 [MARK,gp=0x61] [ASCII] [cached] "a"

From the above output, we note two things:

  • STRSXP objects are vectors of CHARSXP objects, as we mentioned;
  • strings are stored in a "global pool": the "a" string is stored at the same address despite being created independently in two different objects.

Tags:

R