What is the difference between Keras' MaxPooling1D and GlobalMaxPooling1D functions?

Td;lr GlobalMaxPooling1D for temporal data takes the max vector over the steps dimension. So a tensor with shape [10, 4, 10] becomes a tensor with shape [10, 10] after global pooling. MaxPooling1D takes the max over the steps too but constrained to a pool_size for each stride. So a [10, 4, 10] tensor with pooling_size=2 and stride=1 is a [10, 3, 10] tensor after MaxPooling(pooling_size=2, stride=1)

Long answer with graphic aid

Lets say we have a simple sentence with 4 words and we have some vector encoding for the words (like word2vec embeddings). Of course you wont normally max pool over and embedding Tensor but this should do for an example. Also global pooling works across channels but I'll leave that out of this illustration. Finally, things get slightly more complicated with padding but we dont need that here either.

Suppose we have MaxPooling1D(pool_size=2, strides=1). Then

the  [[.7, -0.2, .1]   | pool size is two                  
boy   [.8, -.3,  .2]   | so look at two words at a time    | stride=1 will
will  [.2, -.1,  .4]     and take the max over those       | move the pool down
live  [.4  -.4,  .8]]    2 vectors. Here we looking         1 word. Now we look  
                            'the' and 'boy'.                'boy' and 'will' and 
                                                            take the max.

So that will result in a [1, 3, 3] Tensor with the each timestep being the max over a 2D pool. And since we had 3 pools we have effectively downsampled our timesteps from 4 to 3.

However, if we use GlobalMaxPooling1D we will just take the max vector of that sentence (Tensor) which is probably the vector representation of the word 'live'.

Indeed, here the how GlobalMaxPooling1D is defined in keras

class GlobalMaxPooling1D(_GlobalPooling1D):
    """Global max pooling operation for temporal data.
    # Input shape
        3D tensor with shape: `(batch_size, steps, features)`.
    # Output shape
        2D tensor with shape:
        `(batch_size, features)`
    """

    def call(self, inputs):
        return K.max(inputs, axis=1)

Hopefully that helps, please ask for me to clarify anything.

Additionally here is a example that you can play with:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM, GlobalMaxPooling1D, MaxPooling1D

D = np.random.rand(10, 6, 10)

model = Sequential()
model.add(LSTM(16, input_shape=(6, 10), return_sequences=True))
model.add(MaxPooling1D(pool_size=2, strides=1))
model.add(LSTM(10))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='sgd')

# print the summary to see how the dimension change after the layers are 
# applied

print(model.summary())

# try a model with GlobalMaxPooling1D now

model = Sequential()
model.add(LSTM(16, input_shape=(6, 10), return_sequences=True))
model.add(GlobalMaxPooling1D())
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='sgd')

print(model.summary())