set new index for pandas DataFrame (interpolating?)

This is works well:

import numpy as np
import pandas as pd

def interp(df, new_index):
    """Return a new DataFrame with all columns values interpolated
    to the new_index values."""
    df_out = pd.DataFrame(index=new_index)
    df_out.index.name = df.index.name

    for colname, col in df.iteritems():
        df_out[colname] = np.interp(new_index, df.index, col)

    return df_out

I wonder if you're up against one of pandas limitations; it seems like you have limited choices for aligning your df to an arbitrary set of numbers (your newindex).

For example, your stated newindex only overlaps with the first and last numbers in index, so linear interpolation (rightly) interpolates a straight line between the start (2) and end (27) of your index.

import numpy as np
import pandas as pd
%matplotlib inline

index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)

df = pd.DataFrame(x, index=index)

newindex = np.linspace(min(index), max(index), 100)

df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)

df.plot()
df_reindexed.plot()

image1

If you change newindex to provide more overlapping points with your original data set, interpolation works in a more expected manner:

newindex = np.linspace(min(index), max(index), 26)

df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)

df.plot()
df_reindexed.plot()

image2

There are other methods that do not require one to manually align the indices, but the resulting curve (while technically correct) is probably not what one wants:

newindex = np.linspace(min(index), max(index), 1000)

df_reindexed = df.reindex(index = newindex, method = 'ffill')

df.plot()
df_reindexed.plot()

image3

I looked at the pandas docs but I couldn't identify an easy solution.

https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-reindexing


I have adopted the following solution:

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

def reindex_and_interpolate(df, new_index):
    return df.reindex(df.index | new_index).interpolate(method='index', limit_direction='both').loc[new_index]

index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)

df = pd.DataFrame(x, index=index)

newindex = pd.Float64Index(np.linspace(min(index)-5, max(index)+5, 50))

df_reindexed = reindex_and_interpolate(df, newindex)

plt.figure()
plt.scatter(df.index, df.values, color='red', alpha=0.5)
plt.scatter(df_reindexed.index, df_reindexed.values,  color='green', alpha=0.5)
plt.show()

enter image description here