# Activation Functions

# Why Use Non-Linear Activation Function?

The reason we used hidden layers was to represent complex, non-linear model. Non-linear activation function adds this complexity to the model.

If we don’t use activation or use linear activation function, our final model can be represented with a single linear regression or logistic regression.

# Pros and Cons of Activation Function

- Tanh function is always faster than sigmoid function.
- With sigmoid and tanh, when absolute value of z becomes big, slope of function converges to 0, so gradient descent becomes slower.
- Most times, we use ReLU which is faster than tanh
- Try using Leaky ReLU
- Sigmoid is commonly used as an activation function of output layer in binary classification

## Leave a Comment