R error "Can't join on ... because of incompatible types"

This is a frequently viewed question, so many others must run into the error, so deserves a more complete answer.

The simple solution for correcting this join error is to simply mutate the class of the column(s) causing the problem. This can be done as follows:

  1. glimpse the column classes in the dataframes to be joined
  2. mutate the column class to match using as.numeric, as.logical or as.character. For example:

    df2 <- df2 %>%  
        mutate(column1 = as.numeric(column1))
    

A solution for production environments is in the matchColClasses function shown, which does the following:

  1. Identify columns that share the same name (sharedColNames)
  2. Use the master data frame (df1) to identify the shared columns classes
  3. Reassign column classes in df2 to match df1

    matchColClasses <- function(df1, df2) {
    
      sharedColNames <- names(df1)[names(df1) %in% names(df2)]
      sharedColTypes <- sapply(df1[,sharedColNames], class)
    
      for (n in sharedColNames) {
         class(df2[, n]) <- sharedColTypes[n]
      }
    
      return(df2)
     }
    

This function works well in our production environment, with heterogenous data types; character, numeric and logical.

Tags:

R

Dplyr