UndefinedVariableError when querying pandas DataFrame

data1 = [np.array(df.query('type == @i')['continuous']
     for i in ('Type1', 'Type2', 'Type3', 'Type4')]

use '@' to refer variables

please refer to documentation, which writes:

You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.


I know too late but maybe it helps somebody - use double quotes for i data1 = [np.array(df.query('type == "i"')['continuous']


The i in your query expression

df.query('type == i')

is literally just the string 'i'. Since there are no extra enclosing quotes around it, pandas interprets it as the name of another column in your DataFrame, i.e. it looks for cases where

df['type'] == df['i']

Since there is no i column, you get an UndefinedVariableError.

It looks like you intended to query where the values in the type column are equal to the string variable named i, i.e. where

df['type'] == 'Type1'
df['type'] == 'Type2' # etc.

In this case you need to actually insert the string i into the query expression:

df.query('type == "%s"' % i)

The extra set of quotes are necessary if 'Type1', 'Type2' etc. are values within the type column, but not if they are the names of other columns in the dataframe.

Tags:

Python

Pandas