identify and mark duplicate rows in r

One dplyr option could be:

df.now %>%
 group_by(pair = paste(pmax(fit, sit), pmin(fit, sit), sep = "_")) %>%
 summarise_at(vars(starts_with("value")), ~ ifelse(all(is.na(.)), 
                                                   NA,
                                                   first(na.omit(.))))

  pair    value1 value2 value3
  <chr>    <dbl>  <dbl>  <dbl>
1 it2_it1      1      3      2
2 it4_it3      2      3      4
3 it6_it5      5     NA      2
4 it9_it7     NA      4     NA

And if you also need the pairs in individual columns, then with the addition of tidyr you can do:

df.now %>%
 group_by(pair = paste(pmax(fit, sit), pmin(fit, sit), sep = "_")) %>%
 summarise_at(vars(starts_with("value")), ~ ifelse(all(is.na(.)), 
                                                   NA,
                                                   first(na.omit(.)))) %>%
 separate(pair, into = c("fit", "hit"), sep = "_", remove = FALSE)

  pair    fit   hit   value1 value2 value3
  <chr>   <chr> <chr>  <dbl>  <dbl>  <dbl>
1 it2_it1 it2   it1        1      3      2
2 it4_it3 it4   it3        2      3      4
3 it6_it5 it6   it5        5     NA      2
4 it9_it7 it9   it7       NA      4     NA

Use !duplicated() after sorting.

df.now[!duplicated(t(apply(df.now[, c("fit", "sit")], 1, sort))), ]
#       value1 value2 value3 fit   sit  
# [1,] "1"    NA     NA     "it1" "it2"
# [2,] "2"    "3"    "4"    "it3" "it4"
# [3,] "5"    NA     NA     "it5" "it6"
# [4,] NA     "4"    NA     "it7" "it9"

Tags:

R

Dataframe