Difference between rbind() and bind_rows() in R

Apart from few more differences, one of the main reasons for using bind_rows over rbind is to combine two data frames having different number of columns. rbind throws an error in such a case whereas bind_rows assigns "NA" to those rows of columns missing in one of the data frames where the value is not provided by the data frames.

Try out the following code to see the difference:

a <- data.frame(a = 1:2, b = 3:4, c = 5:6)
b <- data.frame(a = 7:8, b = 2:3, c = 3:4, d = 8:9)

Results for the two calls are as follows:

rbind(a, b)
> rbind(a, b)
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match
library(dplyr)
bind_rows(a, b)
> bind_rows(a, b)
  a b c  d
1 1 3 5 NA
2 2 4 6 NA
3 7 2 3  8
4 8 3 4  9

Since none of the answers here offers a systematic review of the differences between base::rbind and dplyr::bind_rows, and the answer from @bob regarding performance is incorrect, I decided to add the following.

Let's have some testing data frame:

df_1 = data.frame(
  v1_dbl = 1:1000,
  v2_lst = I(as.list(1:1000)),
  v3_fct = factor(sample(letters[1:10], 1000, replace = TRUE)),
  v4_raw = raw(1000),
  v5_dtm = as.POSIXct(paste0("2019-12-0", sample(1:9, 1000, replace = TRUE)))
)

df_1$v2_lst = unclass(df_1$v2_lst) #remove the AsIs class introduced by `I()`

1. base::rbind handles list inputs differently

rbind(list(df_1, df_1))
     [,1]   [,2]  
[1,] List,5 List,5

# You have to combine it with `do.call()` to achieve the same result:
head(do.call(rbind, list(df_1, df_1)), 3)
  v1_dbl v2_lst v3_fct v4_raw     v5_dtm
1      1      1      b     00 2019-12-02
2      2      2      h     00 2019-12-08
3      3      3      c     00 2019-12-09

head(dplyr::bind_rows(list(df_1, df_1)), 3)
  v1_dbl v2_lst v3_fct v4_raw     v5_dtm
1      1      1      b     00 2019-12-02
2      2      2      h     00 2019-12-08
3      3      3      c     00 2019-12-09

2. base::rbind can cope with (some) mixed types

While both base::rbind and dplyr::bind_rows fail when trying to bind eg. raw or datetime column to a column of some other type, base::rbind can cope with some degree of discrepancy.

Combining a list and a non-list column produces a list column. Combining a factor and something else produces a warning but not an error:

df_2 = data.frame(
  v1_dbl = 1,
  v2_lst = 1,
  v3_fct = 1,
  v4_raw = raw(1),
  v5_dtm = as.POSIXct("2019-12-01")
)

head(rbind(df_1, df_2), 3)
  v1_dbl v2_lst v3_fct v4_raw     v5_dtm
1      1      1      b     00 2019-12-02
2      2      2      h     00 2019-12-08
3      3      3      c     00 2019-12-09
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 1) : invalid factor level, NA generated

# Fails on the lst, num combination:
head(dplyr::bind_rows(df_1, df_2), 3)
Error: Column `v2_lst` can't be converted from list to numeric

# Fails on the fct, num combination:
head(dplyr::bind_rows(df_1[-2], df_2), 3)
Error: Column `v3_fct` can't be converted from factor to numeric

3. base::rbind keeps rownames

Tidyverse advocates making rownames into a dedicated column, so its functions drop them.

rbind(mtcars[1:2, 1:4], mtcars[3:4, 1:4])
                mpg cyl disp  hp
Mazda RX4      21.0   6  160 110
Mazda RX4 Wag  21.0   6  160 110
Datsun 710     22.8   4  108  93
Hornet 4 Drive 21.4   6  258 110

dplyr::bind_rows(mtcars[1:2, 1:4], mtcars[3:4, 1:4])
   mpg cyl disp  hp
1 21.0   6  160 110
2 21.0   6  160 110
3 22.8   4  108  93
4 21.4   6  258 110

4. base::rbind cannot cope with missing columns

Just for completeness, since Abhilash Kandwal already said so in their answer.

5. base::rbind handles named arguments differently

While base::rbind prepends argument names to rownames, dplyr::bind_rows has the option to add a dedicated ID column:

rbind(hi = mtcars[1:2, 1:4], bye = mtcars[3:4, 1:4])
                    mpg cyl disp  hp
hi.Mazda RX4       21.0   6  160 110
hi.Mazda RX4 Wag   21.0   6  160 110
bye.Datsun 710     22.8   4  108  93
bye.Hornet 4 Drive 21.4   6  258 110

dplyr::bind_rows(hi = mtcars[1:2, 1:4], bye = mtcars[3:4, 1:4], .id = "my_id")
  my_id  mpg cyl disp  hp
1    hi 21.0   6  160 110
2    hi 21.0   6  160 110
3   bye 22.8   4  108  93
4   bye 21.4   6  258 110

6. base::rbind makes vector arguments into rows (and recycles them)

In contrast, dplyr::bind_rows adds columns (and therefore requires the elements of x to be named):

rbind(mtcars[1:2, 1:4], x = 1:2))
              mpg cyl disp  hp
Mazda RX4      21   6  160 110
Mazda RX4 Wag  21   6  160 110
x               1   2    1   2

dplyr::bind_rows(mtcars[1:2, 1:4], x = c(a = 1, b = 2))
  mpg cyl disp  hp  a  b
1  21   6  160 110 NA NA
2  21   6  160 110 NA NA
3  NA  NA   NA  NA  1  2

7. base::rbind is slower and requires more RAM

To bind a hundred medium-sized data frames (1k rows), base::rbind requires fifty times more RAM and is more than 15 times slower:

dfs = rep(list(df_1), 100)
bench::mark(
  "base::rbind" = do.call(rbind, dfs),
  "dplyr::bind_rows" = dplyr::bind_rows(dfs)
)[, 1:5]

# A tibble: 2 x 5
  expression            min   median `itr/sec` mem_alloc
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>
1 base::rbind       47.23ms  48.05ms      20.0  104.48MB
2 dplyr::bind_rows   3.69ms   3.75ms     261.     2.39MB

Since I needed to bind lots of small data frames, here is a benchmark for that too. Both speed but especially RAM difference is quite striking:

dfs = rep(list(df_1[1:2, ]), 10^4)
bench::mark(
  "base::rbind" = do.call(rbind, dfs),
  "dplyr::bind_rows" = dplyr::bind_rows(dfs)
)[, 1:5]

# A tibble: 2 x 5
  expression            min   median `itr/sec` mem_alloc
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>
1 base::rbind         1.65s    1.65s     0.605    1.56GB
2 dplyr::bind_rows  19.31ms  20.21ms    43.7    566.69KB

Finally, help("rbind") and help("bind_rows") are interesting to read, too.

Tags:

R

Rbind