How is a convolution calculated on an image with three (RGB) channels?

Lets say we have a 3 Channel (RGB) image given by some matrix A


    A = [[[198 218 227]
          [196 216 225]
          [196 214 224]
          ...
          ...
          [185 201 217]
          [176 192 208]
          [162 178 194]]

and a blur kernal as


    K = [[0.1111, 0.1111, 0.1111],
         [0.1111, 0.1111, 0.1111],
         [0.1111, 0.1111, 0.1111]]

    #which is actually 0.111 ~= 1/9

The convolution can be represented as shown in the image below convolution of RGB channel

As you can see in the image, each channel is individually convoluted and then combined to form a pixel.


In Convolution Neural Network, Convolution operation is implemented as follows, (NOTE: COnvolution in blur / filter operation is separate)

For RGB-like inputs, the filter is actually 223, each filter corresponse to one color channel, resulting three filter response. These three add up to one flowing by bias and activation. finally, this is one pixel in the output map.


They will be just the same as how you do with a single channel image, except that you will get three matrices instead of one. This is a lecture note about CNN fundamentals, which I think might be helpful for you.