Construct NetworkX graph from Pandas DataFrame

NetworkX expects a square matrix (of nodes and edges), perhaps* you want to pass it:

In [11]: df2 = pd.concat([df, df.T]).fillna(0)

Note: It's important that the index and columns are in the same order!

In [12]: df2 = df2.reindex(df2.columns)

In [13]: df2
Out[13]: 
       Bar  Bat  Baz  Foo  Loc 1  Loc 2  Loc 3  Loc 4  Loc 5  Loc 6  Loc 7  Quux
Bar      0    0    0    0      0      0      1      1      0      1      1     0
Bat      0    0    0    0      0      0      1      0      0      1      0     0
Baz      0    0    0    0      0      0      1      0      0      0      0     0
Foo      0    0    0    0      0      0      1      1      0      0      0     0
Loc 1    0    0    0    0      0      0      0      0      0      0      0     1
Loc 2    0    0    0    0      0      0      0      0      0      0      0     0
Loc 3    1    1    1    1      0      0      0      0      0      0      0     0
Loc 4    1    0    0    1      0      0      0      0      0      0      0     0
Loc 5    0    0    0    0      0      0      0      0      0      0      0     0
Loc 6    1    1    0    0      0      0      0      0      0      0      0     0
Loc 7    1    0    0    0      0      0      0      0      0      0      0     0
Quux     0    0    0    0      1      0      0      0      0      0      0     0

In[14]: graph = nx.from_numpy_matrix(df2.values)

This doesn't pass the column/index names to the graph, if you wanted to do that you could use relabel_nodes (you may have to be wary of duplicates, which are allowed in pandas' DataFrames):

In [15]: graph = nx.relabel_nodes(graph, dict(enumerate(df2.columns))) # is there nicer  way than dict . enumerate ?

*It's unclear exactly what the columns and index represent for the desired graph.

A little late answer, but now networkx can read data from pandas dataframes, in that case ideally the format is the following for a simple directed graph:

+----------+---------+---------+
|   Source |  Target |  Weight |
+==========+=========+=========+
| Node_1   | Node_2  |   0.2   |
+----------+---------+---------+
| Node_2   | Node_1  |   0.6   |   
+----------+---------+---------+

If you are using adjacency matrixes then Andy Hayden is right, you should take care of the correct format. Since in your question you used 0 and 1, I guess you would like to see an undirected graph. It may seem counterintuitive first since you said Index represents e.g. a person, and columns represent groups to which a given person belongs, but it's correct also in the other way a group (membership) belongs to a person. Following this logic, you should actually put the groups in indexes and the persons in columns too.

Just a side note: You can also define this problem in the sense of a directed graph, for example you would like to visualize an association network of hierarchical categories. There, the association e.g. from Samwise Gamgee to Hobbits is stronger than in the other direction usually (since Frodo Baggins is more likely the Hobbit prototype)

You can also use scipy to create the square matrix like this:

import scipy.sparse as sp

cols = df.columns
X = sp.csr_matrix(df.astype(int).values)
Xc = X.T * X  # multiply sparse matrix
Xc.setdiag(0)  # reset diagonal

# create dataframe from co-occurence matrix in dense format
df = pd.DataFrame(Xc.todense(), index=cols, columns=cols)

Later on you can create an edge list from the dataframe and import it into Networkx:

df = df.stack().reset_index()
df.columns = ['source', 'target', 'weight']

df = df[df['weight'] != 0]  # remove non-connected nodes

g = nx.from_pandas_edgelist(df, 'source', 'target', ['weight'])

Construct NetworkX graph from Pandas DataFrame

Tags:

Python

Pandas

Networkx

Related

Recent Posts