Xavier and he_normal initialization difference

See this discussion on Stats.SE:

In summary, the main difference for machine learning practitioners is the following:

  • He initialization works better for layers with ReLu activation.
  • Xavier initialization works better for layers with sigmoid activation.

Weight (kernel) Initialization parameters for each type of activation function:

  1. Xavier/Glorot Initialization: None, hyperbolic Tan (tanh), Logistic(sigmoid), softmax.
  2. He Initialization: Rectified Linear activation unit(ReLU) and Variants.
  3. LeCun Initialization: Scaled Exponential Linear Unit(SELU)

Application...

keras.layers.Dense(10, activation="relu", kernel_initializer="he_normal")

Here's a link to the research paper by Xavier Glorot, Yoshua Bengio on "Understanding the difficulty of training deep feedforward neural networks", incase you want to understand the importance and the math behind weight initialization. http://proceedings.mlr.press/v9/glorot10a.html