pandas DataFrame concat / update ("upsert")?

One solution is to conatenate df1 with new rows in df2 (i.e. where the index does not match). Then update the values with those from df2.

df = pd.concat([df1, df2[~df2.index.isin(df1.index)]])
df.update(df2)

>>> df
             A   B
2015-10-01  A1  B1
2015-10-02  a1  b1
2015-10-03  a2  b2
2015-10-04  a3  b3

EDIT: Per the suggestion of @chrisb, this can further be simplified as follows:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

Thanks Chris!


In addition to the correct answer, watch out for columns that do not exist in both dataframes:

    df1 = pd.DataFrame([['test',1, True], ['test2',2, True]]).set_index(0)
    df2 = pd.DataFrame([['test2',4], ['test3',3]]).set_index(0)

If you just use the aforementioned solution as-is, you get:

    >>>     1   2
    0       
    test    1   True
    test2   4   NaN
    test3   3   NaN

But if you are expecting the following output:

    >>>     1   2
    0       
    test    1   True
    test2   4   True
    test3   3   NaN

Just change the statement to:

    df1 = pd.concat([df1, df2[~df2.index.isin(df1.index)]])
    df1.update(df2)

As of pandas 1.0.3, the desired functionality is directly given by combine_first:

combined = df2.combine_first(df1)

print(combined)
#               A   B
# 2015-10-01    A1  B1
# 2015-10-02    a1  b1
# 2015-10-03    a2  b2
# 2015-10-04    a3  b3

To get this behaviour, the dataframe whose data has priority (the updating one, in this case df2) must be the one calling the function.

It basically: (1) harmonizes rows and columns, (2) gives priority to non-NaN data, and (3) if datapoints defined in both dataframes, gives priority to data in df2, which is essentially what you want.

EDIT: My understanding is that combine_first does fulfill the requested "update-if-present-insert-if-absent" behaviour. However, according to Vijchti in the comments (thanks), this does not strictly correspond to an SQL UPSERT manipulation, as the logic is applied value-by-value instead of the entire row. I removed any reference to UPSERT from my answer.

Tags:

Python

Pandas