Probability: MultivariateHypergeometricDistribution broken with more than five categories?

This is one of my few gripes re: an otherwise quite nice probability functionality. Sometimes, performance is inexplicably poor.

Usually, trivial manual transformation/intervention can get the results desired speedily, e.g., your example cases:

ClearAll["Global`*"]

dist = MultivariateHypergeometricDistribution[5, ConstantArray[2, 10]];

PDF[dist, 
  Cases[Join @@ 
    Permutations /@ IntegerPartitions[5, {10}, Range[0, 5]], 
        {a_, b_, c_, d_, e_, f_, g_, h_, i_, j_} /; a == 1]] // Tr

PDF[dist, 
  Cases[Join @@ 
    Permutations /@ IntegerPartitions[5, {10}, Range[0, 5]], 
    {a_, b_, c_, d_, e_, f_, g_, h_, i_, j_} /; a == 1 && b >= 0]] // Tr

PDF[dist, 
  Cases[Join @@ 
    Permutations /@ IntegerPartitions[5, {10}, Range[0, 5]], 
       {a_, b_, c_, d_, e_, f_, g_, h_, i_, j_} /; a == b]] // Tr

PDF[dist, 
  Cases[Join @@ 
    Permutations /@ IntegerPartitions[5, {10}, Range[0, 5]], 
    {a_, b_, c_, d_, e_, f_, g_, h_, i_, j_} /; a != b]] // Tr

All return quickly.

When only certain categories are involved, further transformation will speed things hugely:

dist = MultivariateHypergeometricDistribution[5, Append[ConstantArray[2, 2], 16]];

PDF[dist, 
  Cases[Join @@ 
    Permutations /@ IntegerPartitions[5, {3}, Range[0, 5]], {a_, b_, c_} /; a != b]] // Tr

Obviously, for cases with equal categories, further speed can be had by dispensing with permutations and applying appropriate factor to each result for number of permutations.

Lastly, for ranged queries, see How to get probabilities for multinomial & hypergeometric distribution ranges more quickly?, a little sorcery I cooked up...


Although @ciao's solution gets you to the answer, I would like to offer perhaps another angle at it.

Given a tuple $\{X_1, X_2, \ldots, X_n\}$ that follows a multivariate hypergeometric distribution with parameters $N$, $\{M_1, \ldots, M_n\}$, the tuple $\{X_1, X_2, \sum_{k=3}^n X_k \}$ also follows a multivariate hypergeometric with parameters $N$ and $\{M_1, M_2, \sum_{k=3}^n M_k\}$.

Hence the problem can be reformulated with fewer variables, which leads to a faster solution:

AbsoluteTiming[
 Probability[a == 1 && b >= 0, 
  Distributed[{a, b, c}, 
   MultivariateHypergeometricDistribution[
    5, {2, 2, Plus[2, 2, 2, 2, 2, 2, 2, 2]}]]]]

(* Out[29]= {0.0504801, 15/38} *)

AbsoluteTiming[
 Probability[a == b, 
  Distributed[{a, b, c}, 
   MultivariateHypergeometricDistribution[
    5, {2, 2, Plus[2, 2, 2, 2, 2, 2, 2, 2]}]]]]

(* Out[30]= {0.0075801, 138/323} *)

AbsoluteTiming[
 Probability[a > b, 
  Distributed[{a, b, c}, 
   MultivariateHypergeometricDistribution[
    5, {2, 2, Plus[2, 2, 2, 2, 2, 2, 2, 2]}]]]]

(* Out[32]= {0.00749303, 185/646} *)