Using mca package in Python

One other method is to use the library prince which enables easy usage of tools such as:

  1. Multiple correspondence analysis (MCA)
  2. Principal component analysis (PCA)
  3. Multiple factor analysis (MFA)

You can begin first by installing with:

pip install --user prince

To use MCA, it is fairly simple and can be done in a couple of steps (just like sklearn PCA method.) We first build our dataframe.

import pandas as pd 
import prince

X = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/balloons/adult+stretch.data')
X.columns = ['Color', 'Size', 'Action', 'Age', 'Inflated']

print(X.head())

mca = prince.MCA()

# outputs
>>     Color   Size   Action    Age Inflated
   0  YELLOW  SMALL  STRETCH  ADULT        T
   1  YELLOW  SMALL  STRETCH  CHILD        F
   2  YELLOW  SMALL      DIP  ADULT        F
   3  YELLOW  SMALL      DIP  CHILD        F
   4  YELLOW  LARGE  STRETCH  ADULT        T

Followed by calling the fit and transform method.

mca = mca.fit(X) # same as calling ca.fs_r(1)
mca = mca.transform(X) # same as calling ca.fs_r_sup(df_new) for *another* test set.
print(mca)

# outputs
>>         0             1
0   0.705387  8.373126e-15
1  -0.386586  8.336230e-15
2  -0.386586  6.335675e-15
3  -0.852014  6.726393e-15
4   0.783539 -6.333333e-01
5   0.783539 -6.333333e-01
6  -0.308434 -6.333333e-01
7  -0.308434 -6.333333e-01
8  -0.773862 -6.333333e-01
9   0.783539  6.333333e-01
10  0.783539  6.333333e-01
11 -0.308434  6.333333e-01
12 -0.308434  6.333333e-01
13 -0.773862  6.333333e-01
14  0.861691 -5.893240e-15
15  0.861691 -5.893240e-15
16 -0.230282 -5.930136e-15
17 -0.230282 -7.930691e-15
18 -0.695710 -7.539973e-15

You can even print out the picture diagram of it, since it incorporates matplotlib library.

ax = mca.plot_coordinates(
     X=X,
     ax=None,
     figsize=(6, 6),
     show_row_points=True,
     row_points_size=10,
     show_row_labels=False,
     show_column_points=True,
     column_points_size=30,
     show_column_labels=False,
     legend_n_cols=1
     )

ax.get_figure().savefig('images/mca_coordinates.svg')

mca


The documentation of the mca package is not very clear with that regard. However, there are a few cues which suggest that ca.fs_r_sup(df_new) should be used to project new (unseen) data onto the factors obtained in the analysis.

  1. The package author refers to new data as supplementary data which is the terminology used in following paper: Abdi, H., & Valentin, D. (2007). Multiple correspondence analysis. Encyclopedia of measurement and statistics, 651-657.
  2. The package has only two functions which accept new data as parameter DF: fs_r_sup(self, DF, N=None) and fs_c_sup(self, DF, N=None). The latter is to find the column factor scores.
  3. The usage guide demonstrates this based on a new data frame which has not been used throughout the component analysis.