Reasonable way to have different versions of None?

The simplest way to go would be with strings: 'not counted', 'unknown' and 'N/A'. However if you want to process quickly in numpy, arrays with mixed numbers/objects are not your friend.

My suggestion would be to add several arrays of the same shape as your data, consisting of 0 and 1. So the array missing = 1 where spot is missing else 0, and so on, same with array not_measured, etc..

Then you can use NaNs everywhere, and later mask your data with, say, np.where(missing == 1) to easily find the specific NaNs you need.

If you just want an object that is not any known value, and is also not None, just make a new object:

NOT_APPLICABLE = object()
NOT_MEASURED = object()
UNKNOWN = object()

Now you can just use those values exactly like you would use None:

[1.4, .9, .5, .7, UNKNOWN]

...

if value is UNKNOWN:
    # do something

etc.

If you need a value that can be represented as a float (e.g. in a numpy array), you can create a NaN value with "extra" data encoded in the mantissa. It may not be safe to do so, however, because there is no guarantee that those bits are preserved through various operations on the values.

Here is a solution (disclaimer: HACK!) that avoids speed bumps such as object dtype or separate masks:

There appears to be quite a bit of "dead space" around the fp representation of nan:

>>> nan_as_int = np.array(np.nan).view(int)[()]
>>> nan_as_int
9221120237041090560

>>> custom_nan = np.arange(nan_as_int, nan_as_int+10).view(float)
>>> custom_nan
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])

We have created ten distinct nans. Please note that this is different from creating multiple instances using float("nan"). Such instances will all map to the same value in numpy and hence be indiscriminable once put in a non object array.

Even though our ten nans have distinct representations, at the float level they are difficult to tell apart (because per definition nan != nan even for unique nan). So we need a little helper:

>>> def which_nan(a):
...     some_nan = np.isnan(a)
...     return np.where(some_nan, np.subtract(a.view(int), nan_as_int, where=some_nan), -1)

Example:

>>> exmpl = np.array([0.1, 1.2, custom_nan[3], custom_nan[0]])
>>> exmpl
array([0.1, 1.2, nan, nan])
>>> which_nan(exmpl)
array([-1, -1,  3,  0], dtype=int64)

Perhaps surprisingly, this appears to survive at least some basic numpy operations:

>>> which_nan(np.sin(exmpl))
array([-1, -1,  3,  0], dtype=int64)

Reasonable way to have different versions of None?

Tags:

Python

Python 3.X

Numpy

Nonetype

Related

Recent Posts