What's the fastest way to take Mean of a Tensor in given slot?

Here is a compiled version of Total that is faster for intermediate levels:

total[d_, l_] := With[
    {o = ConstantArray[1., Dimensions[d][[l]]], n = Length@Dimensions@d-l+1},
    Switch[l,
        1, o.d,

        Length@Dimensions@d, d.o,

        _,
        With[{fc = Compile[{{data, _Real, n}}, o.data, RuntimeAttributes->{Listable}]},
            fc[d]
        ]
    ]
]

For your higher rank example:

d = RandomReal[{-1, 1}, {400, 500, 300}];

r1 = total[d, 2]; //RepeatedTiming
r2 = Total[d, {2}]; //RepeatedTiming

r1 == r2

{0.0632, Null}

{0.11, Null}

True

I will leave the modification to compute the mean instead to you.


First, don't use TensorQ. It is undocumented and may change in the future. The documented function to use is ArrayQ. Second, here an arbitrary-rank version of your function,

mean[A_?ArrayQ, slot_] :=  With[
   {d = Part[Dimensions[A], slot]}, 
   ConstantArray[1./d, d].Transpose[A, 1 <-> slot]
]

Unfortunately, it is slower than Total for higher-rank, because the cost of rearranging memory is too high. I suspect that Carl's approach is probably the best you can do, but I can't immediately prove that.