Compare Series containing None

The None get casted to NaN and NaN has the property that it is not equal to itself:

[54]:
b = pd.Series([None, None, 4, 5])
b

Out[54]: 
0    NaN
1    NaN
2    4.0
3    5.0
dtype: float64

As you can see here:

In[55]:
b==b

Out[55]: 
0    False
1    False
2     True
3     True
dtype: bool

I'm not sure how you can get this to work correctly, although this works:

In[68]:
( (b == b.shift())  | ( (b != b.shift()) &  (b != b) ) )

Out[68]: 
0     True
1     True
2    False
3    False
dtype: bool

You'll get a false result for the first row because when you shift down you're comparing against a non-existent row:

In[69]:
b.shift()

Out[69]: 
0    NaN
1    NaN
2    NaN
3    4.0
dtype: float64

So the NaN is comparing True from the boolean logic as the first row is NaN and so is the shifted series' first row.

To work around the first row False-positive you could slice the resultant result to ignore the first row:

In[70]:
( (b == b.shift())  | ( (b != b.shift()) &  (b != b) ) )[1:]

Out[70]: 
1     True
2    False
3    False
dtype: bool

As to why it gets casted, Pandas tries to coerce the data to a compatible numpy, here float is selected because of the ints and None values, None and NaN cannot be represented by ints

To get the same result as a in your example, you should overwrite the first row to False as it should always fail:

In[78]:
result = pd.Series( ( (b == b.shift())  | ( (b != b.shift()) &  (b != b) ) ) )
result.iloc[0] = False
result

Out[78]: 
0    False
1     True
2    False
3    False
dtype: bool

Compare Series containing None

Tags:

Python

Pandas

Related

Recent Posts