Number of equivalence classes of $w \times h$ matrices under switching rows and columns

This has a very straightforward answer using the Burnside lemma. With $n$ rows, $m$ columns and $q$ possible values we simply compute the cycle index of the cartesian product group ($S_n \times S_m$, consult Harary and Palmer, Graphical Enumeration, section 4.3) and evaluate it at $a[p]=q$ as we have $q$ possibilities for an assignment that is constant on the cycle. The cycle index is easy too -- for two cycles of length $p_1$ and $p_2$ that originate in a permutation $\alpha$ from $S_n$ and $\beta$ from $S_2$ the contribution is $a[\mathrm{lcm}(p_1, p_2)]^{\gcd(p_1, p_2)}.$

We get for a $3\times3$ the following colorings of at most $q$ colors:

$$1, 36, 738, 8240, 57675, 289716, 1144836, 3780288,\ldots$$

which points us to OEIS A058001 where these values are confirmed.

We get for a $4\times 4$ the following colorings of at most $q$ colors:

$$1, 317, 90492, 7880456, 270656150, 4947097821, \\ 58002778967, 490172624992,\ldots$$

which points us to OEIS A058002 where again these values are confirmed.

We get for a $5\times 5$ the following colorings of at most $q$ colors:

$$1, 5624, 64796982, 79846389608, 20834113243925, 1979525296377132, \\ 93242242505023122, 2625154125717590496,\ldots$$

which points us to OEIS A058003 where here too these values are confirmed.

This was the Maple code.

with(combinat);

pet_cycleind_symm :=
proc(n)
option remember;

    if n=0 then return 1; fi;

    expand(1/n*add(a[l]*pet_cycleind_symm(n-l), l=1..n));
end;

pet_flatten_term :=
proc(varp)
local terml, d, cf, v;

    terml := [];

    cf := varp;
    for v in indets(varp) do
        d := degree(varp, v);
        terml := [op(terml), seq(v, k=1..d)];
        cf := cf/v^d;
    od;

    [cf, terml];
end;


pet_cycles_prod :=
proc(cyca, cycb)
local ca, cb, lena, lenb, res, vlcm;

    res := 1;

    for ca in cyca do
        lena := op(1, ca);

        for cb in cycb do
            lenb := op(1, cb);

            vlcm := lcm(lena, lenb);
            res := res*a[vlcm]^(lena*lenb/vlcm);
        od;
    od;

    res;
end;

pet_cycleind_symmNM :=
proc(n, m)
local indA, indB, res, termA, termB, flatA, flatB;
option remember;

    if n=1 then
        indA := [a[1]];
    else
        indA := pet_cycleind_symm(n);
    fi;

    if m=1 then
        indB := [a[1]];
    else
        indB := pet_cycleind_symm(m);
    fi;

    res := 0;

    for termA in indA do
        flatA := pet_flatten_term(termA);
        for termB in indB do
            flatB := pet_flatten_term(termB);

            res := res + flatA[1]*flatB[1]*
            pet_cycles_prod(flatA[2], flatB[2]);
        od;

    od;

    res;
end;

mat_count :=
proc(n, m, q)

    subs([seq(a[p]=q, p=1..n*m)],
         pet_cycleind_symmNM(n, m));
end;

Addendum. The above can be optimized so that the contribution from a pair $(\alpha,\beta)$ does not require computing $l_\alpha \times l_\beta$ cycle pairs (product of total number of cycles) but only $m_\alpha \times m_\beta$ cycle pairs (product of number of different cycle sizes present). This is shown below.

with(combinat);

pet_cycleind_symm :=
proc(n)
option remember;

    if n=0 then return 1; fi;

    expand(1/n*add(a[l]*pet_cycleind_symm(n-l), l=1..n));
end;

pet_flatten_termA :=
proc(varp)
local terml, d, cf, v;

    terml := [];

    cf := varp;
    for v in indets(varp) do
        d := degree(varp, v);
        terml := [op(terml), [op(1,v), d]];
        cf := cf/v^d;
    od;

    [cf, terml];
end;


pet_cycles_prodA :=
proc(cyca, cycb)
local ca, cb, lena, lenb, insta, instb, res, vlcm;

    res := 1;

    for ca in cyca do
        lena := op(1, ca);
        insta := op(2, ca);

        for cb in cycb do
            lenb := op(1, cb);
            instb := op(2, cb);

            vlcm := lcm(lena, lenb);
            res := res*
            a[vlcm]^(insta*instb*lena*lenb/vlcm);
        od;
    od;

    res;
end;

pet_cycleind_symmNM :=
proc(n, m)
local indA, indB, res, termA, termB, flatA, flatB;
option remember;

    if n=1 then
        indA := [a[1]];
    else
        indA := pet_cycleind_symm(n);
    fi;

    if m=1 then
        indB := [a[1]];
    else
        indB := pet_cycleind_symm(m);
    fi;

    res := 0;

    for termA in indA do
        flatA := pet_flatten_termA(termA);
        for termB in indB do
            flatB := pet_flatten_termA(termB);

            res := res + flatA[1]*flatB[1]*
            pet_cycles_prodA(flatA[2], flatB[2]);
        od;

    od;

    res;
end;

mat_count :=
proc(n, m, q)

    subs([seq(a[p]=q, p=1..n*m)],
         pet_cycleind_symmNM(n, m));
end;

Addendum Nov 17 2018. There is additional simplification possible here, based on the simple observation that a product of powers of variables implements the multiset concept through indets (distinct elements) and degree (number of occurrences). This means there is no need to flatten the terms of the cycle indices $Z(S_n)$ and $Z(S_m)$ to build multisets, we have multisets already and we may instead iterate over the variables present in pairs of monomials representing a conjugacy class from $Z(S_n)$ and $Z(S_m)$ and compute $a[\mathrm{lcm}(p_1, p_2)]^{\gcd(p_1, p_2)}$ for pairs of cycles $a_{p_1}$ and $a_{p_2}.$ This makes for a highly compact algorithm, which will produce e.g. for a three by four,

$${\frac {{a_{{1}}}^{12}}{144}}+1/24\,{a_{{1}}}^{6}{a_{{2}}}^{3} +1/18\,{a_{{1}}}^{3}{a_{{3}}}^{3}+1/12\,{a_{{2}}}^{6} \\+1/6\,{a_{{4}}}^{3}+1/48\,{a_{{1}}}^{4}{a_{{2}}}^{4} +1/8\,{a_{{2}}}^{5}{a_{{1}}}^{2}+1/6\,a_{{1}}a_{{2}}a_{{3}}a_{{6}} \\+1/8\,{a_{{3}}}^{4}+1/12\,{a_{{3}}}^{2}a_{{6}} +1/24\,{a_{{6}}}^{2}+1/12\,a_{{12}}.$$

This is the Maple code.

with(combinat);

pet_cycleind_symm :=
proc(n)
option remember;

    if n=0 then return 1; fi;

    expand(1/n*add(a[l]*pet_cycleind_symm(n-l), l=1..n));
end;

pet_cycleind_symmNM :=
proc(n, m)
local indA, indB, res, termA, termB, varA, varB,
    lenA, lenB, instA, instB, p, lcmv;
option remember;

    if n=1 then
        indA := [a[1]];
    else
        indA := pet_cycleind_symm(n);
    fi;

    if m=1 then
        indB := [a[1]];
    else
        indB := pet_cycleind_symm(m);
    fi;

    res := 0;

    for termA in indA do
        for termB in indB do
            p := 1;

            for varA in indets(termA) do
                lenA := op(1, varA);
                instA := degree(termA, varA);

                for varB in indets(termB) do
                    lenB := op(1, varB);
                    instB := degree(termB, varB);

                    lcmv := lcm(lenA, lenB);
                    p :=
                    p*a[lcmv]^(instA*instB*lenA*lenB/lcmv);
                od;
            od;

            res := res + lcoeff(termA)*lcoeff(termB)*p;
        od;
    od;

    res;
end;

mat_count :=
proc(n, m, q)

    subs([seq(a[p]=q, p=1..n*m)],
         pet_cycleind_symmNM(n, m));
end;

Some have asked me about my Python version. It turns out python is missing a lot of what maple provides for symbolic manipulation. Here is my python version. It follows very closely @Marko Riedel's version, and executes on my machine in 0.6 seconds:

from fractions import *
from copy import *


def expand(frac, terml):
    for term in terml:
        term[0] *= frac
    return terml


def multiplyTerm(sub, terml):
    terml = deepcopy(terml)
    for term in terml:
        alreadyIncluded = False
        for a in term[1]:    # term[1] is a list like [[1,1],[2,3]]  where the
            if a[0] == sub:  # first item is subscript and second the exponent
                alreadyIncluded = True
                a[1] += 1
                break
        if not alreadyIncluded:
            term[1].append([sub, 1])

    return terml


def add(termla, termlb):
    terml = termla + termlb

    # now combine any terms with same a's
    if len(terml) <= 1:
        return terml
    #print "t", terml
    for i in range(len(terml) - 1):
        for j in range(i + 1, len(terml)):
            #print "ij", i, j
            if set([(a[0], a[1]) for a in terml[i][1]]) == set([(b[0], b[1]) for b in terml[j][1]]):
                terml[i][0] = terml[i][0] + terml[j][0]
                terml[j][0] = Fraction(0, 1)

    return [term for term in terml if term[0] != Fraction(0, 1)]


def lcm(a, b):
    return abs(a * b) / gcd(a, b) if a and b else 0

pet_cycnn_cache = {}
def pet_cycleind_symm(n):
    global pet_cycnn_cache
    if n == 0:
        return [ [Fraction(1.0), []] ]

    if n in pet_cycnn_cache:
        #print "hit", n
        return pet_cycnn_cache[n]

    terml = []
    for l in range(1, n + 1):
        terml = add(terml, multiplyTerm(l,  pet_cycleind_symm(n - l)) )

    pet_cycnn_cache[n] = expand(Fraction(1, n), terml)
    return pet_cycnn_cache[n]


def pet_cycles_prodA(cyca, cycb):
    alist = []
    for ca in cyca:
        lena = ca[0]
        insta = ca[1]

        for cb in cycb:
            lenb = cb[0]
            instb = cb[1]

            vlcm = lcm(lena, lenb)
            alist.append([vlcm, (insta * instb * lena * lenb) / vlcm])

    #combine terms (this actually ends up being faster than if you don't)
    if len(alist) <= 1:
        return alist

    for i in range(len(alist) - 1):
        for j in range(i + 1, len(alist)):
            if alist[i][0] == alist[j][0] and alist[i][1] != -1:
                alist[i][1] += alist[j][1]
                alist[j][1] = -1

    return [a for a in alist if a[1] != -1]


def pet_cycleind_symmNM(n, m):
    indA = pet_cycleind_symm(n)
    indB = pet_cycleind_symm(m)
    #print "got ind", len(indA), len(indB), len(indA) * len(indB)
    terml = []

    for flatA in indA:
        for flatB in indB:
            newterml = [
                [flatA[0] * flatB[0], pet_cycles_prodA(flatA[1], flatB[1])]
            ]
            #print "b",len(terml)
            #terml = add(terml, newterml)
            terml.extend(newterml)

    #print "got nm"
    return terml


def substitute(term, v):
    total = 1
    for a in term[1]:
    #need to cast the v and a[1] to int or 
    #they will be silently converted to double in python 3 
    #causing answers to be wrong with larger inputs
    total *= int(v)**int(a[1])
    return (term[0] * total)


def answer(w, h, s):
    terml = pet_cycleind_symmNM(w, h)
    #print terml
    total = 0
    for term in terml:
        total += substitute(term, s)

    return int(total)

print answer(12, 12, 20)

After struggling with this problem for a couple weeks and attempting to understand the given code and explanation, I believe I have come up with a somewhat more elegant solution for Python. For those like me who have very little experience with combinatorics, I am also including my explanation of the math behind the code that will hopefully be easy to understand for people new to this stuff. First, the solution in Python (interactive example here):

from math import factorial
from fractions import Fraction
import math

def answer(w, h, s):
    total = 0 # initialize return value
    # generate cycle indices for the set of rows and set of columns
    cycidx_cols = cycle_index(w)
    cycidx_rows = cycle_index(h)
    # combine every possible pair of row and column permutations
    for col_coeff, col_cycle in cycidx_cols:
        for row_coeff, row_cycle in cycidx_rows:
            coeff = col_coeff * row_coeff # combine coefficients
            cycle = combine(col_cycle, row_cycle) # combine cycles
            # substitute each variable for s
            value = 1
            for x, power in cycle:
                value *= s ** power
            # multiply by the coefficient and add to the total
            total += coeff * value
    return str(total)

## combines sets of variables with their coefficients to generate a complete cycle index
## returns [ ( Fraction:{coeff}, [ ( int:{length}, int:{frequency} ):{cycle}, ... ]:{cycles} ):{term}, ... ]
def cycle_index(n):
    return [(coeff(term), term) for term in gen_vars(n, n)]

## calculates the coefficient of a term based on values associated with its variable(s)
## this is based off part of the general formula for finding the cycle index of a symmetric group
def coeff(term):
    val = 1
    for x, y in term:
        val *= factorial(y) * x ** y
    return Fraction(1, val)

## generates the solution set to the problem: what are all combinations of numbers <= n that sum to n?
## this corresponds to the set of variables in each term of the cycle index of symmetric group S_n
def gen_vars(n, lim):
    soln_set = [] # store the solution set in a list
    if n > 0: # breaks recursive loop when false and returns an empty list
        for x in range(lim, 0, -1): # work backwards from the limit
            if x == 1: # breaks recursive loop when true and returns a populated list
                soln_set.append([(1, n)])
            else: # otherwise, enter recursion based on how many x go into n
                for y in range(int(n / x), 0, -1):
                    # use recursion on the remainder across all values smaller than x
                    recurse = gen_vars(n - x * y, x - 1)
                    # if recursion comes up empty, add the value by itself to the solution set
                    if len(recurse) == 0:
                        soln_set.append([(x, y)])
                    # otherwise, append the current value to each solution and add that to the solution set
                    for soln in recurse:
                        soln_set.append([(x, y)] + soln)
    return soln_set # return the list of solutions

## combines two terms of a cycle index of the form [ ( int:{length}, int:{frequency} ):{cycle}, ... ]
def combine(term_a, term_b):
    combined = []
    # combine all possible pairs of variables
    for len_a, freq_a in term_a:
        for len_b, freq_b in term_b:
            # new subscript = lcm(len_a, len_b)
            # new superscript = len_a * freq_a * len_b * freq_b / lcm(len_a, len_b)
            lcm = len_a * len_b / math.gcd(len_a, len_b)
            combined.append((lcm, int(len_a * freq_a * len_b * freq_b / lcm)))
    return combined

Now, the explanation: We are asked to find the number of unique matrices given the width $w$, height $h$, and number of possible values $s$. Normally, this would be as simple as counting permutations, which would give us $(w \cdot h)^s$ unique matrices. However, the challenge of this problem comes from the equivalency relationship defined by the ability to shuffle the rows and columns of the matrix. So, we should first consider what happens when we shuffle around rows and columns. We will begin by considering the set of rows separately from the set of columns, but the same methods can be applied to both. Later, we will combine the two results to create a representation of the whole matrix.

We will begin by figuring out the different possible ways to transform one row into another. (In a matrix, this would be equivalent to shuffling the order of the columns.) Let us consider a row of length 4. One possible transformation on would be $\begin{pmatrix}1&2&3&4\\3&1&2&4\end{pmatrix}$, where the top row transforms into the bottom row. If we continually apply this transformation on the same row, you will notice that the value in position 4 stays put while the other three values will follow the cycle $1\rightarrow3\rightarrow2\rightarrow1$. Interestingly, every single possible transformation can be mapped to a unique group of cycles. For example, the above transformation can be mapped to the cycle group $g_8=(132)(4)$. This is one of $4!=24$ unique cycle groups for a row or column of length 4. The complete list is shown here:

$$G=\{(1234), (1243), (1324), (1342), (1423), (1432), (123)(4), (132)(4), (124)(3), (142)(3), (134)(2), (143)(2), (234)(1), (243)(1), (12)(34), (13)(24), (14)(23), (12)(3)(4), (13)(2)(4), (14)(2)(3), (23)(1)(4), (24)(1)(3), (34)(1)(2), (1)(2)(3)(4)\}$$

You may notice that the cycle groups can be categorized into five unique types (represented with five unique terms): $a_4=(abcd)$, $a_1a_3=(abc)(d)$, $a_2^2=(ab)(cd)$, $a_1^2a_2=(ab)(c)(d)$, $a_1^4=(a)(b)(c)(d)$, where each variable $a_p^q$ represents a cycle of length $p$ appearing $q$ times in the cycle group. We can generate the complete list of these types for any $n$ by answering the question, "What are all of the different ways for a set of numbers $\{x \in X : 1 \leq x \leq n\}$ to sum to $n$?" For $n=4$, this would be $(4)$, $(3+1)$, $(2+2)$, $(2+1+1)$, and $(1+1+1+1)$. We can rewrite these as a set of vectors $\textbf{j}=(j_1,j_2,j_3,j_4)$, where $j_x$ represents the frequency of $x$ in the sum:

$$J_4=\{(0,0,0,1),(1,0,1,0),(0,2,0,0),(2,1,0,0),(4,0,0,0)\}$$

We will make use of this set later. The function gen_vars(n, lim) recursively generates $J_n$ for any $n$ (initially, lim == n). However, it is returned by the function in the form of a list of lists of pairs of integers [[(p,q),...],...] where each inner list represents a unique sum and each pair represents the value p and its frequency q in the sum. This list representation speeds up calculations later on.

Returning to the notation $a_p^q$ representing cycles, we can form an equation that represents the entire set of possible cycle groups. We do this by adding each of these terms multiplied by their frequency in $G$:

$$6a_4+8a_1a_3+3a_2^2+6a_1^2a_2+a_1^4$$

Furthermore, if we divide the entire polynomial by the total number of cycles, we get each term's contribution to the complete set of cycle groups:

$$\frac{1}{4}a_4+\frac{1}{3}a_1a_3+\frac{1}{8}a_2^2+\frac{1}{4}a_1^2a_2+\frac{1}{24}a_1^4=Z(S_4)$$

This is known as the cycle index $Z(X)$ for the symmetric group $S_4$. This link includes the cycle indices for the first 5 symmetric groups, and you can reverse these steps to verify that each $Z(S_n)$ accurately represents all possible cycle groups for a set of length $n$. Importantly, we are also given a general formula for finding the cycle index of any $S_n$ (cleaned up a little):

$$Z(S_n)=\sum_{\textbf{j} \in J_n} \left(\frac{1}{\prod_{k=0}^n(k^{j_k} \cdot j_k!)}a_1^{j_1}a_2^{j_2}...a_n^{j_n}\right)$$

This is where that set $J_4$ from earlier comes into play. Indeed, if you plug in the associated values, you will come up with the cycle index for the symmetric group $S_4$. The function coeff(term) calculates the $\frac{1}{\prod_{k=0}^n(k^{j_k} \cdot j_k!)}$ portion of the equation. The cycle_index(n) function puts the coefficients with their terms, returning a list that is representative of the appropriate cycle index.

The cycle index will tell us how many different rows are possible such that no row can be transformed into another using any of the transformations that we found. All we have to do is plug in the number of possible values $s$ in for each variable $a_x$ in our equation (regardless of the value of $x$). For example, if we use $s=3$, we find that there should be 15 unique rows. Here is the list of all possible rows for $s=3$ to confirm this result:

$$R=\{(1,1,1,1),(1,1,1,2),(1,1,1,3),(1,1,2,2),(1,1,2,3),(1,1,3,3),(1,2,2,2),(1,2,2,3),(1,2,3,3),(1,3,3,3),(2,2,2,2),(2,2,2,3),(2,2,3,3),(2,3,3,3),(3,3,3,3)\}$$

This same result can be found using the formula for combinations with replacement, however, this equation fails when applied to a matrix, which is why we are using cycle indices. So, once the cycle indices have been calculated for both the set of rows and the set of columns in our matrix, we must combine them to form the cycle index for the entire matrix. This is done term by term, combining each term of the first with each term in the second. Marko Riedel has an excellent step-by-step explanation of how to do this for a $2 \times 3$ matrix in another post linked here. However, I would like to clarify one part that confused me when I first read it. In order to combine two variables $a_p^q$ and $b_x^y$, use the following template (where $\text{lcm}(a,b)$ is the least common multiple of $a$ and $b$):

$$C(a_p^q,b_x^y)=a_{\text{lcm}(p,x)}^{p\cdot q\cdot x\cdot y/\text{lcm}(p,x)}$$

The combining of terms (ignoring the coefficients, which are multiplied in answer(w, h, s)) is done by the function combine(term_a, term_b) which returns the combined term. This entire process is brought together within the function answer(w, h, s). It calls each of the other functions in turn to create the cycle index for the matrix then plugs in $s$ for each variable to give us our final result.

Hope this helps! I will be more than happy to clarify anything in the comments.