Why Use Non-Linear Activation Function?
The reason we used hidden layers was to represent complex, non-linear model. Non-linear activation function adds this complexity to the model.
If we don’t use activation or use linear activation function, our final model can be represented with a single linear regression or logistic regression.
Pros and Cons of Activation Function
- Tanh function is always faster than sigmoid function.
- With sigmoid and tanh, when absolute value of z becomes big, slope of function converges to 0, so gradient descent becomes slower.
- Most times, we use ReLU which is faster than tanh
- Try using Leaky ReLU
- Sigmoid is commonly used as an activation function of output layer in binary classification