What exactly is the definition of a 'Module' in PyTorch?

It's a simple container.

From the docs of nn.Module

Base class for all neural network modules. Your models should also subclass this class. Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes. Submodules assigned in this way will be registered, and will have their parameters converted too when you call .cuda(), etc.

From the tutorial:

All network components should inherit from nn.Module and override the forward() method. That is about it, as far as the boilerplate is concerned. Inheriting from nn.Module provides functionality to your component. For example, it makes it keep track of its trainable parameters, you can swap it between CPU and GPU with the .to(device) method, where device can be a CPU device torch.device("cpu") or CUDA device torch.device("cuda:0").

A module is a container from which layers, model subparts (e.g. BasicBlock in resnet in torchvision) and models should inherit. Why should they? Because the inheritance from nn.Module allows you to call methods like to("cuda:0"), .eval(), .parameters() or register hooks easily.

  • why not just call the 'module' a model, and call the layers 'layers'? I suppose maybe it's just semantics and splitting hairs, but still...

That's an API design choice and I find having only a Module class instead of two separate Model and Layers to be cleaner and to allow more freedom (it's easier to send just a part of the model to GPU, to get parameters only for some layers...).


why not just call the 'module' a model, and call the layers 'layers'?

This is by inheritance, since PyTorch inherited Torch originally written in Lua, and in there they called it module.

What exactly is the definition of a 'Module' in PyTorch?

There are different kinds of definitions in general.

Here is one pragmatic:

  • A module is something that has a structure and runs forward trough that structure to get the output (return value).

This one is structural:

  • Module also knows the state, since you can ask to provide you the list of parameters: module.parameters().

This one is functional:

  • Module can call module.zero_grad() to set gradients of all parameters inside to zero. This is something we should do after every backprop step. This shows module also has to deal with backprop which is the step when parameters marked for update will be updated.

Module parameters marked for update have requires_grad=True like this:

Parameter containing:
tensor([-0.4411, -0.2094, -0.5322, -0.0154, -0.1009], requires_grad=True)

You can say parameters are just like tensors except they have an attribute requires_grad where you can decide should they update during backprop or no.

Finally, back to forward step to get an important note:

class ZebraNet(nn.Module):

    def __init__(self, num_classes=1000):
        super(self).__init__()
        self.convpart = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpooling = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.convpart(x)
        x = self.avgpooling(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        return x

You see how the structure is set in __init__ and how forward() will tell you what will happen with the input x and what will be returned. This return value will have the dimension of the output we need. Based on how precise we are predicting the output we have worse or better accuracy, which is usually our metric to track our progress.


Without being a pytorch expert is my understanding that a module in the context of pytorch is simply a container, which takes receives tensors as input and computes tensors as output.

So, in conclusion, your model is quite likely to be composed of multiple modules, for example, you might have 3 modules each representing a layer of a neural network. Thus, they are related in the sense you need modules to actualise your model, but they aren't the same thing.

Hope that helps