Why do weight parameters of logistic regression get initialized to zeros?

If all the weights are initialized to zero, backpropagation will not work as expected because the gradient for the intermediate neurons and starting neurons will die out(become zero) and will not update ever. The reason is, in backward pass of the NN, the gradient at some intermediate neuron is multiplied by the weights of the outgoing edge from that neuron to the neuron in next layer, which would be zero and hence the gradient at that intermediate node would be zero too. Subsequently all the weights will never improve and the model will end up only correcting the weights directly connected to output neurons only.


Incase of Neural Networks there n neurons in each layer. So if you initialize weight of each neuron with 0 then after back propogation each of them will have same weights :

shallow neural network

Neurons a1 and a2 in the first layer will have same weights no matter how long you iterate. Since they are calculating the same function.

Which is not the case with logistic regression its simply y = Wx + b.


I think the above answers are a bit misleading. Actually, the sigmoid function, which is also called the logit function, is always used in logistic regression for its special properties. For example,

P(y=1|x;\theta) =g(\theta^Tx)= \frac{1}{1+e^{-\theta^Tx}}

(Sorry for the ugly formula). And its corresponding function is shown as below: enter image description here Thus, zeros ensure the values are always on the linear area, making the propagation easier.