Pandas scatter_matrix - plot categorical variables

You need to transform the categorical variables into numbers to plot them.

Example (assuming that the column 'Sex' is holding the gender data, with 'M' for males & 'F' for females)

df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1

Now all females are represented by 0 & males by 1. Unknown genders (if there are any) will be ignored.

The rest of your code should process the updated dataframe nicely.


after googling and remembering something like the .map() function I fixed it in the following way:

colors=['red','green'] # color codes for survived : 0=red or 1=green

# create mapping Series for gender so it can be plotted
gender = Series([0,1],index=['male','female'])    
df['gender']=df.Sex.map(gender)

# create mapping Series for Embarked so it can be plotted
embarked = Series([0,1,2,3],index=df.Embarked.unique())
df['embarked']=df.Embarked.map(embarked)

# add survived also back to the df
df['survived']=target

now I can plot it again...and drop the added columns afterwards.

thanks everyone for responding.....