Target array shape different to expected output using Tensorflow

You need to uncomment the Flatten layer when creating your model. Essentially what this layer does is that it takes a 4D input (batch_size, height, width, num_filters) and unrolls it into a 2D one (batch_size, height * width * num_filters). This is needed to get the output shape you want.


Un-comment the flatten layer before your output layer in create_model(self), conv layers don't work with 1D tensors/arrays, and so for you to get the output layer of the right shape to add a Flatten() layer right before your output layer, like this:

def create_model(self):
        '''
        Creating the ConvNet model.
        '''
        self.model = Sequential()
        self.model.add(Conv2D(64, (3, 3), input_shape=self.training_images.shape[1:]), activation='relu')
        #self.model.add(Activation("relu"))
        self.model.add(MaxPooling2D(pool_size=(2,2)))

        self.model.add(Conv2D(64, (3,3), activation='relu'))
        #self.model.add(Activation("relu"))
        self.model.add(MaxPooling2D(pool_size=(2,2)))

        # self.model.add(Dense(64))
        # self.model.add(Activation('relu'))
        self.model.add(Flatten())

        self.model.add(Dense(10, activation='softmax'))
        #self.model.add(Activation(activation='softmax'))

        self.model.compile(loss="categorical_crossentropy", optimizer="adam", 
                           metrics=['accuracy'])

        print ('model output shape:', self.model.output_shape)#prints out the output shape of your model

The code above will give you a model with an output shape of (None, 10).

Also please use activation as a layer parameter in the future.


Use model.summary() to inspect the output shapes of your model. Without the commented out Flatten() layer the shapes of your layers retain the original dimensions of the image and the shape of the output layer is (None, 6, 6, 10).

What you want to do here is roughly:

  1. start with a shape of (batch_size, img width, img heigh, channels)
  2. use convolutions to detect patterns through the image by applying a filter
  3. reduce the img width and height with max pooling
  4. then Flatten() the dimensions of the image so that instead of (width, heigh, features) you end up with just a set of features.
  5. match against your classes.

The commented out code does step 4; when you remove the Flatten() layer you end up with the wrong set of dimensions at the end.