What is the class definition of nn.Linear in PyTorch?

The Network defined as having two layers, hidden and output. Roughly speaking, the function of the hidden layer is to hold parameters you can optimize during training.


What is the class definition of nn.Linear in pytorch?

From documentation:


CLASS torch.nn.Linear(in_features, out_features, bias=True)

Applies a linear transformation to the incoming data: y = x*W^T + b

Parameters:

  • in_features – size of each input sample (i.e. size of x)
  • out_features – size of each output sample (i.e. size of y)
  • bias – If set to False, the layer will not learn an additive bias. Default: True

Note that the weights W have shape (out_features, in_features) and biases b have shape (out_features). They are initialized randomly and can be changed later (e.g. during the training of a Neural Network they are updated by some optimization algorithm).

In your Neural Network, the self.hidden = nn.Linear(784, 256) defines a hidden (meaning that it is in between of the input and output layers), fully connected linear layer, which takes input x of shape (batch_size, 784), where batch size is the number of inputs (each of size 784) which are passed to the network at once (as a single tensor), and transforms it by the linear equation y = x*W^T + b into a tensor y of shape (batch_size, 256). It is further transformed by the sigmoid function, x = F.sigmoid(self.hidden(x)) (which is not a part of the nn.Linear but an additional step).

Let's see a concrete example:

import torch
import torch.nn as nn

x = torch.tensor([[1.0, -1.0],
                  [0.0,  1.0],
                  [0.0,  0.0]])

in_features = x.shape[1]  # = 2
out_features = 2

m = nn.Linear(in_features, out_features)

where x contains three inputs (i.e. the batch size is 3), x[0], x[1] and x[3], each of size 2, and the output is going to be of shape (batch size, out_features) = (3, 2).

The values of the parameters (weights and biases) are:

>>> m.weight
tensor([[-0.4500,  0.5856],
        [-0.1807, -0.4963]])

>>> m.bias
tensor([ 0.2223, -0.6114])

(because they were initialized randomly, most likely you will get different values from the above)

The output is:

>>> y = m(x)
tensor([[-0.8133, -0.2959],
        [ 0.8079, -1.1077],
        [ 0.2223, -0.6114]])

and (behind the scenes) it is computed as:

y = x.matmul(m.weight.t()) + m.bias  # y = x*W^T + b

i.e.

y[i,j] == x[i,0] * m.weight[j,0] + x[i,1] * m.weight[j,1] + m.bias[j]

where i is in interval [0, batch_size) and j in [0, out_features).