Set values on the diagonal of pandas.DataFrame

In [21]: df.values[[np.arange(df.shape[0])]*2] = 0

In [22]: df
Out[22]: 
          0         1         2         3         4
0  0.000000  0.931374  0.604412  0.863842  0.280339
1  0.531528  0.000000  0.641094  0.204686  0.997020
2  0.137725  0.037867  0.000000  0.983432  0.458053
3  0.594542  0.943542  0.826738  0.000000  0.753240
4  0.357736  0.689262  0.014773  0.446046  0.000000

Note that this will only work if df has the same number of rows as columns. Another way which will work for arbitrary shapes is to use np.fill_diagonal:

In [36]: np.fill_diagonal(df.values, 0)

Using np.fill_diagonal(df.values, 1) Is the easiest, but you need to make sure your columns all have the same data type I had a mixture of np.float64 and python floats and it would only effect the numpy values. to fix you have to cast everything to numpy.

Both approaches in unutbu's answer assume that labels are irrelevant (they operate on the underlying values).

The OP code works with .loc and so is label based instead (i.e. put a 0 on cells in row-column with same labels, rather than in cells located on the diagonal - admittedly, this is irrelevant in the specific example given, in which labels are just positions).

Being in need of the "label-based" diagonal filling (working with a DataFrame describing an incomplete adjacency matrix), the simplest approach I could come up with was:

def pd_fill_diagonal(df, value):
    idces = df.index.intersection(df.columns)
    stacked = df.stack(dropna=False)
    stacked.update(pd.Series(value,
                             index=pd.MultiIndex.from_arrays([idces,
                                                              idces])))
    df.loc[:, :] = stacked.unstack()

Set values on the diagonal of pandas.DataFrame

Tags:

Python

Pandas

Numpy

Related

Recent Posts