Neural network bias for each neuron

Both approaches are representing the same bias concept. For each unit (excluding input nodes) you compute the value of activation function of a dot product of weights and activations from previous layers (in case of feed forward network) vectors plus scalar bias value :

 (w * a) + b

In Andrew Ng this value is computed using vectorisation trick in which you concatenate your activations with specified bias constant (usually 1) and that does the job (because this constant has its own weight for different nodes - so this is exactly the same to having another bias value for each node).


Regarding the differences between the two, @Marcin has answered them beautifully.

It's interesting that in his Deep Learning specialization by deeplearning.ai, Andrew takes a different approach from his Machine Learning course (where he took one bias term for every hidden layer) and associates a bias term with each associated neuron.

Though both the approaches try to achieve the same result, in my opinion, the one with associating a bias with each neuron is much more explicit and helps immensely with hyperparameter tuning, especially when you're dealing with large neural network architectures like CNN, Deep Neural Network, etc.