What does backbone mean in a neural network?

Backbone is a term used in DeepLab models/papers to refer to the feature extractor network. These feature extractor networks compute features from the input image and then these features are upsampled by a simple decoder module of DeepLab models to generate segmented masks. The authors of DeepLab models have shown performance with different feature extractors (backbones) like MobileNet, ResNet, and Xception network.


In my understanding, the "backbone" refers to the feature extracting network which is used within the DeepLab architecture. This feature extractor is used to encode the network's input into a certain feature representation. The DeepLab framework "wraps" functionalities around this feature extractor. By doing so, the feature extractor can be exchanged and a model can be chosen to fit the task at hand in terms of accuracy, efficiency, etc.

In case of DeepLab, the term backbone might refer to models like the ResNet, Xception, MobileNet, etc.


TL;DR Backbone is not a universal technical term in deep learning.

(Disclaimer: yes, there may be a specific kind of method, layer, tool etc. that is called "backbone", but there is no "backbone of a neural network" in general.)

If authors use the word "backbone" as they are describing a neural network architecture, they mean

  • feature extraction ( a part of the network that "sees" the input), but this interpretation is not quite universal in the field: for instance, in my opinion, computer vision researchers would use the term to mean feature extraction, whereas natural language processing researchers would not.
  • in informal language, that this part in question is crucial to the overall method.