np.isnan on arrays of dtype "object"

You could just use a list comp to get the indexes of any nan's which may be faster in this case:

obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)

inds = [i for i,n in enumerate(obj_arr) if str(n) == "nan"]

Or if you want a boolean mask:

mask = [True if str(n) == "nan" else False for n in obj_arr]

Using is np.nan also seems to work without needing to cast to str:

In [29]: obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)

In [30]: [x is np.nan for x in obj_arr]
Out[30]: [False, False, True, False]

For flat and multidimensional arrays you could check the shape:

def masks(a):
    if len(a.shape) > 1:
        return [[x is np.nan for x in sub] for sub in a]
    return [x is np.nan for x in a]

If is np.nan can fail maybe check the type then us np.isnan

def masks(a):
    if len(a.shape) > 1:
        return [[isinstance(x, float) and np.isnan(x) for x in sub] for sub in arr]
    return [isinstance(x, float) and np.isnan(x)  for x in arr]

Interestingly x is np.nan seems to work fine when the data type is object:

In [76]: arr = np.array([np.nan,np.nan,"3"],dtype=object)

In [77]: [x is np.nan  for x in arr]
Out[77]: [True, True, False]

In [78]: arr = np.array([np.nan,np.nan,"3"])

In [79]: [x is np.nan  for x in arr]
Out[79]: [False, False, False]

depending on the dtype different things happen:

In [90]: arr = np.array([np.nan,np.nan,"3"])

In [91]: arr.dtype
Out[91]: dtype('S32')

In [92]: arr
Out[92]: 
array(['nan', 'nan', '3'], 
      dtype='|S32')

In [93]: [x == "nan"  for x in arr]
Out[93]: [True, True, False]

In [94]: arr = np.array([np.nan,np.nan,"3"],dtype=object)

In [95]: arr.dtype
Out[95]: dtype('O')

In [96]: arr
Out[96]: array([nan, nan, '3'], dtype=object)

In [97]: [x == "nan"  for x in arr]
Out[97]: [False, False, False]

Obviously the nan's get coerced to numpy.string_'s when you have strings in your array so x == "nan" works in that case, when you pass object the type is float so if you are always using object dtype then the behaviour should be consistent.


If you are willing to use the pandas library, a handy function that cover this case is pd.isnull:

pandas.isnull(obj)

Detect missing values (NaN in numeric arrays, None/NaN in object arrays)

Here is an example:

$ python
>>> import numpy   
>>> import pandas
>>> array = numpy.asarray(['a', float('nan')], dtype=object)
>>> pandas.isnull(array)
array([False,  True])

Tags:

Python

Numpy