cannot convert nan to int (but there are no nans)

Basically the error is telling you that you NaN values and I will show why your attempts didn't reveal this:

In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
     a
0  1.0
1  NaN
2  3.0
3  4.0

now try to cast:

df['a'].astype(int)

this raises:

ValueError: Cannot convert NA to integer

but then you tried something like this:

In [5]:
for index, row in df['a'].iteritems():
    if row == np.NaN:
        print('index:', index, 'isnull')

this printed nothing, but NaN cannot be evaluated like this using equality, in fact it has a special property that it will return False when comparing against itself:

In [6]:
for index, row in df['a'].iteritems():
    if row != row:
        print('index:', index, 'isnull')

index: 1 isnull

now it prints the row, you should use isnull for readability:

In [9]:
for index, row in df['a'].iteritems():
    if pd.isnull(row):
        print('index:', index, 'isnull')

index: 1 isnull

So what to do? We can drop the rows: df.dropna(subset='a'), or we can replace using fillna:

In [8]:
df['a'].fillna(0).astype(int)

Out[8]:
0    1
1    0
2    3
3    4
Name: a, dtype: int32

When your series contains floats and nan's and you want to convert to integers, you will get an error when you do try to convert your float to a numpy integer, because there are na values.

DON'T DO:

df['VEHICLE_ID'] = df['VEHICLE_ID'].astype(int)

From pandas >= 0.24 there is now a built-in pandas integer. This does allow integer nan's. Notice the capital in 'Int64'. This is the pandas integer, instead of the numpy integer.

SO, DO THIS:

df['VEHICLE_ID'] = df['VEHICLE_ID'].astype('Int64')

More info on pandas integer na values:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions

Tags:

Pandas