Pytorch softmax: What dimension to use?

Steven's answer is not correct. See the snapshot below. It is actually the reverse way.

enter image description here

Image transcribed as code:

>>> x = torch.tensor([[1,2],[3,4]],dtype=torch.float)
>>> F.softmax(x,dim=0)
tensor([[0.1192, 0.1192],
        [0.8808, 0.8808]])
>>> F.softmax(x,dim=1)
tensor([[0.2689, 0.7311],
        [0.2689, 0.7311]])

Let's consider the example in two dimensions

x = [[1,2],
    [3,4]]

do you want your final result to be

y = [[0.27,0.73],
    [0.27,0.73]]

or

y = [[0.12,0.12],
    [0.88,0.88]]

If it's the first option then you want dim = 1. If it's the second option you want dim = 0.

Notice that the columns or zeroth dimension is normalized in the second example hence it is normalized along the zeroth dimension.

Updated 2018-07-10: to reflect that zeroth dimension refers to columns in pytorch.


The easiest way I can think of to make you understand is: say you are given a tensor of shape (s1, s2, s3, s4) and as you mentioned you want to have the sum of all the entries along the last axis to be 1.

sum = torch.sum(input, dim = 3) # input is of shape (s1, s2, s3, s4)

Then you should call the softmax as:

softmax(input, dim = 3)

To understand easily, you can consider a 4d tensor of shape (s1, s2, s3, s4) as a 2d tensor or matrix of shape (s1*s2*s3, s4). Now if you want the matrix to contain values in each row (axis=0) or column (axis=1) that sum to 1, then, you can simply call the softmax function on the 2d tensor as follows:

softmax(input, dim = 0) # normalizes values along axis 0
softmax(input, dim = 1) # normalizes values along axis 1

You can see the example that Steven mentioned in his answer.


I am not 100% sure what your question means but I think your confusion is simply that you don't understand what dim parameter means. So I will explain it and provide examples.

If we have:

m0 = nn.Softmax(dim=0)

what that means is that m0 will normalize elements along the zeroth coordinate of the tensor it receives. Formally if given a tensor b of size say (d0,d1) then the following will be true:

sum^{d0}_{i0=1} b[i0,i1] = 1, forall i1 \in {0,...,d1}

you can easily check this with a Pytorch example:

>>> b = torch.arange(0,4,1.0).view(-1,2)
>>> b 
tensor([[0., 1.],
        [2., 3.]])
>>> m0 = nn.Softmax(dim=0) 
>>> b0 = m0(b)
>>> b0 
tensor([[0.1192, 0.1192],
        [0.8808, 0.8808]])

now since dim=0 means going through i0 \in {0,1} (i.e. going through the rows) if we choose any column i1 and sum its elements (i.e. the rows) then we should get 1. Check it:

>>> b0[:,0].sum()
tensor(1.0000)
>>> b0[:,1].sum()
tensor(1.0000)

as expected.

Note we do get all rows sum to 1 by "summing out the rows" with torch.sum(b0,dim=0), check it out:

>>> torch.sum(b0,0)
tensor([1.0000, 1.0000])

We can create a more complicated example to make sure it's really clear.

a = torch.arange(0,24,1.0).view(-1,3,4)
>>> a
tensor([[[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]],

        [[12., 13., 14., 15.],
         [16., 17., 18., 19.],
         [20., 21., 22., 23.]]])
>>> a0 = m0(a)
>>> a0[:,0,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,2,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,1,1].sum()
tensor(1.0000)
>>> a0[:,2,3].sum()
tensor(1.0000)

so as we expected if we sum all the elements along the first coordinate from the first value to the last value we get 1. So everything is normalized along the first dimension (or first coordiante i0).

>>> torch.sum(a0,0)
tensor([[1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000]])

Also along the dimension 0 means that you vary the coordinate along that dimension and consider each element. Sort of like having a for loop going through the values the first coordinates can take i.e.

for i0 in range(0,d0):
    a[i0,b,c,d]

Tags:

Python

Pytorch