How Do Activation Functions Shape the Training Dynamics