Expressions with "== True" and "is True" give different results

I think in pandas comparison only works with == and result is boolean Series. With is output is False. More info about is.

print df[0] == True
0     True
1    False
2     True
Name: 0, dtype: bool

print df[df[0]]
      0
0  True
2  True

print df[df[0] == True]
      0
0  True
2  True

print df[0] is True
False

print df[df[0] is True]
0     True
1    False
2     True
Name: 0, dtype: bool

The catch here is that in df[df[0] == True], you are not comparing objects to True.

As the other answers say, == is overloaded in pandas to produce a Series instead of a bool as it normally does. [] is overloaded, too, to interpret the Series and give the filtered result. The code is essentially equivalent to:

series = df[0].__eq__(True)
df.__getitem__(series)

So, you're not violating PEP8 by leaving == here.


Essentially, pandas gives familiar syntax unusual semantics - that is what caused the confusion.

According to Stroustroup (sec.3.3.3), operator overloading has been causing trouble due to this ever since its invention (and he had to think hard whether to include it into C++). Seeing even more abuse of it in C++, Gosling ran to the other extreme in Java, banning it completely, and that proved to be exactly that, an extreme.

As a result, modern languages and code tend to have operator overloading but watch closely not to overuse it and for semantics to stay consistent.


In python, is tests if an object is the same as another. == is defined by a pandas.Series to act element-wise, is is not.

Because of that, df[0] is True compares if df[0] and True are the same object. The result is False, which in turn is equal to 0, so you get the 0 columns when doing df[df[0] is True]


One workaround for not having complaints from linters but still reasonable syntax for sub-setting could be:

s = pd.Series([True] * 10 + [False])

s.loc[s == True]  # bad comparison in Python's eyes
s.loc[s.isin([True])]  # valid comparison, not as ugly as s.__eq__(True)

Both also take the same time.

In addition, for dataframes one can use query:

df = pd.DataFrame([
        [True] * 10 + [False],
        list(range(11))],
    index=['T', 'N']).T
df.query("T == True")  # also okay