pandas: read_csv how to force bool data to dtype bool instead of object

You can use dtype, it accepts a dictionary for mapping columns:

dtype : Type name or dict of column -> type
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
import pandas as pd
import numpy as np
import io

# using your sample
csv_file = io.BytesIO('''
A    B    C    D
a    1    2    true
b    5    7    false
c    3    2    true
d    9    4''')

df = pd.read_csv(csv_file, sep=r'\s+', dtype={'D': np.bool})
# then fillna to convert NaN to False
df = df.fillna(value=False)

df 
   A  B  C      D
0  a  1  2   True
1  b  5  7  False
2  c  3  2   True
3  d  9  4  False

df.D.dtypes
dtype('bool')

As you had a missing value in your csv the dtype of the columns is shown to be object as you have mixed dtypes, the first 3 row values are boolean, the last will be a float.

To convert the NaN value use fillna, it accepts a dict to map desired fill values with columns and produce a homogeneous dtype:

>>> t = """
A   B   C    D
a   1  NaN  true
b   5   7   false
c   3   2   true
d   9   4 """
>>> df = pd.read_csv(io.StringIO(t),sep='\s+')
>>> df
   A  B   C    D
0  a  1  NaN  True
1  b  5   7   False
2  c  3   2   True
3  d  9   4   NaN
>>> df.fillna({'C':0, 'D':False})
   A  B  C   D
0  a  1  0  True
1  b  5  7  False
2  c  3  2  True
3  d  9  4  False

Tags:

Python

Pandas