How to calculate mutual information

Take the first feature $f_1$ and build the join histogram $(feature\ state,class\ state)$. Your features have $3$ possible states $\{-1,0,1\}$, classes have $2$ possible states $\{c=1,c=2\}$. To build the histogram simply count the join occurrences:

\begin{array}{|c|c|c|} \hline & c=1 & c=2 \\ \hline f_1=-1 & 0 & 1 \\ \hline f_1=0 & 1 & 1 \\ \hline f_1=+1 & 1 & 0 \\ \hline \end{array}

You see that $f_1=0$ is uninformative, because $c=1$ or $c=2$ are possible with equal probability. However if $f_1=-1$, with the data we have, it is a priori $c=2$ (because you have zero count for $c=1$). Mutual informative exactly quantify this. To compute it, you must first normalize your 2D histogram such that $\sum h_{ij}=1$ and you must compute marginals $p(feature)$ and $p(class)$

$$ p(feature,class)=\left(\begin{array}{cc} 0 & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & 0 \\ \end{array}\right),\ p(feature)=\left(\begin{array}{c} \frac{1}{4} \\ \frac{1}{2} \\ \frac{1}{4} \\ \end{array}\right),\ p(class)=\left(\frac{1}{2},\ \frac{1}{2}\right) $$ then compute: $I(x,y)=\int\int p(x,y) \frac{\log p(x,y)}{ p(x) \cdot p(y)}dxdy$ as follows: $$ I(feature, class)=\sum_{i=1,2,3}\sum_{j=1,2}p(feature\ i,class\ j)\log\frac{p(feature\ i,class\ j)}{p(feature\ i)p(class\ j)} $$ Then repeat the same computation for feature $f_2$ and $f_3$. The one with the highest mutual information is the most discriminative for guessing the class.