NetworkX - Setting node attributes from dataframe

As of Networkx 2.0, you can input a dictionary of dictionaries into nx.set_node_attributes to set attributes for multiple nodes. This is a much more streamlined approach compared to iterating over each node manually. The outer dictionary keys represent each node, and the inner dictionaries keys correspond to the attributes you want to set for each node. Something like this:

attrs = {
    node0: {attr0: val00, attr1: val01},
    node1: {attr0: val10, attr1: val11},
    node2: {attr0: val20, attr1: val21},
}
nx.set_node_attributes(G, attrs)

You can find more detail in the documentation.

Using your example, assuming your index is id, you can convert your dataframe df_attributes_only of node attributes to this format and add to your graph:

df_attributes_only = pd.DataFrame(
    [['jim', 'tall', 'red', 'fat'], ['john', 'small', 'blue', 'fat']],
    columns=['id', 'attribute1', 'attribute2', 'attribute3']
)
node_attr = df_attributes_only.set_index('id').to_dict('index')
nx.set_node_attributes(g, node_attr)

g.nodes['jim']


>>> {'attribute1': 'tall', 'attribute2': 'red', 'attribute3': 'fat'}

nx.from_pandas_dataframe (and from_pandas_edgelist in latest stable version 2.2), conceptually converts an edgelist to a graph. I.e., each row in the dataframe represents an edge, which is a pair of 2 different nodes.

Using this API it is not possible to read nodes' attributes. It makes sense, because each row has two different nodes and keeping specific columns for the different nodes would be cumbersome and can cause discrepancies. For example, consider the following dataframe:

node_from node_to src_attr_1 tgt_attr_1
  a         b         0         3
  a         c         2         4

What should be the 'src_attr_1' value for node a? Is it 0 or 2? Moreover, we need to keep two columns for each attribute (since it's a node attribute both of the nodes in each edge should have it). In my opinion it would be bad design to support it, and I guess that's why NetworkX API doesn't.

You can still read nodes' attributes, after converting the df to a graph, as follows:

import networkx as nx
import pandas as pd

# Build a sample dataframe (with 2 edges: 0 -> 1, 0 -> 2, node 0 has attr_1 value of 'a', node 1 has 'b', node 2 has 'c')
d = {'node_from': [0, 0], 'node_to': [1, 2], 'src_attr_1': ['a','a'], 'tgt_attr_1': ['b', 'c']}
df = pd.DataFrame(data=d)
G = nx.from_pandas_edgelist(df, 'node_from', 'node_to')

# Iterate over df rows and set the source and target nodes' attributes for each row:
for index, row in df.iterrows():
    G.nodes[row['node_from']]['attr_1'] = row['src_attr_1']
    G.nodes[row['node_to']]['attr_1'] = row['tgt_attr_1']

print(G.edges())
print(G.nodes(data=True))

Edit:

In case you want to have a large list of attributes for the source node, you can extract the dictionary of this columns automatically as follows:

#List of desired source attributes:
src_attributes = ['src_attr_1', 'src_attr_2', 'src_attr_3']

# Iterate over df rows and set source node attributes:
for index, row in df.iterrows():
    src_attr_dict = {k: row.to_dict()[k] for k in src_attributes}    
    G.nodes[row['node_from']].update(src_attr_dict)

NetworkX - Setting node attributes from dataframe

Edit:

Tags:

Python

Pandas

Networkx

Related

Recent Posts