How to pass data.frame for UPDATE with R DBI

Here is a little helper function I put together using REPLACE INTO to update a table using DBI, replacing old duplicate entries with the new values. It's basic and for my own needs, but should be easy to modify. All you need to pass to the function is the connection, table name, and dataframe. Note that the table must have a PRIMARY KEY column. I've also included a simple working example.

row_to_list <- function(Y)  suppressWarnings(split(Y, f = row(Y)))

sql_val <- function(y){
  if(!is.numeric(y)){
    return(paste0("'",y,"'"))
  }else{
    if(is.na(y)){
      return("NULL")
    }else{
      return(as.character(y))
    }
  }
}

to_sql_row <- function(x) paste0("(",paste(do.call("c", lapply(x, FUN = sql_val)), collapse = ", "),")")

bracket <- function(x) paste0("`",x,"`")

to_sql_string <- function(x) paste0("(",paste(sapply(x, FUN = bracket), collapse = ", "),")")

replace_into_table <- function(con, table_name, new_data){
  #new_data <- data.table(new_data)
  cols <- to_sql_string(names(new_data))
  vals <- paste(lapply(row_to_list(new_data), FUN = to_sql_row), collapse = ", ")
  query <- paste("REPLACE INTO", table_name, cols, "VALUES", vals)
  rs <- dbExecute(con, query)
  return(rs)
}

tb <- data.frame("id" = letters[1:20], "A" = 1:20, "B" = seq(.1,2,.1)) # sample data
dbWriteTable(con, "test_table", tb) # create table
dbExecute(con, "ALTER TABLE test_table ADD PRIMARY KEY (id)") # set primary key

new_data <- data.frame("id" = letters[19:23], "A" = 1:5, "B" = seq(101,105)) # new data
new_data[4,2] <- NA # add some NA values
new_data[5,3] <- NA
table_name <- "test_table"
replace_into_table(con, "test_table", new_data)

result <- dbReadTable(con, "test_table")

Really late, my answer, but maybe still helpful...

There is no single function (I know) in the DBI/odbc package but you can replicate the update behavior using a prepared update statement (which should work faster than RODBC's sqlUpdate since it sends the parameter values as a batch to the SQL server:

library(DBI)
library(odbc)

con <- dbConnect(odbc::odbc(), driver="{SQL Server Native Client 11.0}", server="dbserver.domain.com\\default,1234", Trusted_Connection = "yes", database = "test")  # assumes Microsoft SQL Server

dbWriteTable(con, "iris", iris, row.names = TRUE)      # create and populate a table (adding the row names as a separate columns used as row ID)

update <- dbSendQuery(con, 'update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Length"=?, "Petal.Width"=?, "Species"=? WHERE row_names=?')

# create a modified version of `iris`
iris2 <- iris
iris2$Sepal.Length <- 5
iris2$Petal.Width[2] <- 1
iris2$row_names <- rownames(iris)  # use the row names as unique row ID

dbBind(update, iris2)  # send the updated data

dbClearResult(update)  # release the prepared statement

# now read the modified data - you will see the updates did work
data1 <- dbReadTable(con, "iris")

dbDisconnect(con)

This works only if you have a primary key which I created in the above example by using the row names which are a unique number increased by one for each row...

For more information about the odbc package I have used in the DBI dbConnect statement see: https://github.com/rstats-db/odbc

Building on R Yoda's answer, I made myself the helper function below. This allows using a dataframe to specify update conditions.

While I built this to run transaction updates (i.e. single rows), it can in theory update multiple rows passing a condition. However, that's not the same as updating multiple rows using an input dataframe. Maybe somebody else can build on this...


dbUpdateCustom = function(x, key_cols, con, schema_name, table_name) {
  
  if (nrow(x) != 1) stop("Input dataframe must be exactly 1 row")
  if (!all(key_cols %in% colnames(x))) stop("All columns specified in 'key_cols' must be present in 'x'")
  
  # Build the update string --------------------------------------------------

  df_key     <- dplyr::select(x,  one_of(key_cols))
  df_upt     <- dplyr::select(x, -one_of(key_cols))
  
  set_str    <- purrr::map_chr(colnames(df_upt), ~glue::glue_sql('{`.x`} = {x[[.x]]}', .con = con))
  set_str    <- paste(set_str, collapse = ", ")
  
  where_str  <- purrr::map_chr(colnames(df_key), ~glue::glue_sql("{`.x`} = {x[[.x]]}", .con = con))
  where_str  <- paste(where_str, collapse = " AND ")
  
  update_str <- glue::glue('UPDATE {schema_name}.{table_name} SET {set_str} WHERE {where_str}')
  
  # Execute ------------------------------------------------------------------
  
  query_res <- DBI::dbSendQuery(con, update_str)
  DBI::dbClearResult(query_res)

  return (invisible(TRUE))
}

Where

x: 1-row dataframe that contains 1+ key columns, and 1+ update columns.
key_cols: character vector, of 1 or more column names that are the keys (i.e. used in the WHERE clause)

How to pass data.frame for UPDATE with R DBI

Tags:

Database

R

Dataframe

Rodbc

R Dbi

Related

Recent Posts