Keras Binary Classification - Sigmoid activation function

You can assign the threshold explicitly in compile() by using

tf.keras.metrics.BinaryAccuracy(
    name="binary_accuracy", dtype=None, threshold=0.5
)

like following:

model.compile(optimizer='sgd',
              loss='mse',
              metrics=[tf.keras.metrics.BinaryAccuracy()])

The output of a binary classification is the probability of a sample belonging to a class.

how is Keras distinguishing between the use of sigmoid in a binary classification problem, or a regression problem?

It does not need to. It uses the loss function to calculate the loss, then the derivatives and update the weights.

In other words:

  • During training the framework minimizes the loss. The user must specify the loss function (provided by the framework) or supply their own. The network only cares about the scalar value this function outputs and its 2 arguments are predicted y^ and actual y.
  • Each activation function implements the forward propagation and back-propagation functions. The framework is only interested in these 2 functions. It does not care what the function does exactly, as long as it is differentiable for gradient descent to work.