Merge two data frames while keeping the original row order

You can also check out the inner_join function in Hadley's dplyr package (next iteration of plyr). It preserves the row order of the first data set. The minor difference to your desired solution is that it also preserves the original column order of the first data set. So it does not necessarily put the column we used for merging at the first position.

Using your example above, the inner_join result looks like this:

inner_join(df.2,df.1)
Joining by: "class"
  object class prob
1      A     2  0.7
2      B     1  0.5
3      D     2  0.7
4      F     3  0.3
5      C     1  0.5

You just need to create a variable which gives the row number in df.2. Then, once you have merged your data, you sort the new data set according to this variable. Here is an example :

df.1<-data.frame(class=c(1,2,3), prob=c(0.5,0.7,0.3))
df.2<-data.frame(object=c('A','B','D','F','C'), class=c(2,1,2,3,1))
df.2$id  <- 1:nrow(df.2)
out  <- merge(df.2,df.1, by = "class")
out[order(out$id), ]

From data.table v1.9.5+, you can do:

require(data.table) # v1.9.5+
setDT(df.1)[df.2, on="class"]

The performs a join on column class by finding out matching rows in df.1 for each row in df.2 and extracting corresponding columns.


Check out the join function in the plyr package. It's like merge, but it allows you to keep the row order of one of the data sets. Overall, it's more flexible than merge.

Using your example data, we would use join like this:

> join(df.2,df.1)
Joining by: class
  object class prob
1      A     2  0.7
2      B     1  0.5
3      D     2  0.7
4      F     3  0.3
5      C     1  0.5

Here are a couple of links describing fixes to the merge function for keeping the row order:

http://www.r-statistics.com/2012/01/merging-two-data-frame-objects-while-preserving-the-rows-order/

http://r.789695.n4.nabble.com/patching-merge-to-allow-the-user-to-keep-the-order-of-one-of-the-two-data-frame-objects-merged-td4296561.html