Categories:

Updated:

# Notations

Input layer is 0th layer, first hidden layer is 1st layer, and so on.

Term inside square brackets is the dimensionality of that notation.

$L$: total number of layers excluding input layer

$m$: number of training examples

$n^{[l]}$: number of units in lth layer

$W^{[l]}$: weight matrix of linear transformation that outputs lth layer [$n^{[l-1]}\times n^{[l]}$]

$b^{[l]}$: bias of linear transformation that outputs lth layer [$1 \times n^{[l]}$]

$Z^{[l]}$: linear transformation output in lth layer [$m\times n^{[l]}$]

$A^{[l]}$: unit matrix of lth layer [$m\times n^{[0]}$]

• $A^{[0]}$ equals $X$ which is an input matrix. When $i>0$, $A^{[l]}$ is an activation of $Z^{[l]}$.
• $A^{[L]}$ equals $\hat{Y}$ which is our prediction.

$g^{[l]}$: activation function of lth layer

$J$: cost

$dZ, dW$ are abbreviations of $\frac{dJ}{dZ}, \frac{dJ}{dW}$ respectively.

# Forward Propagation

For $l=1,2,…,L$

Remember that $A^{[0]}$ is input matrix $X$ and $A^{[L]}$ is our prediction

# Backward Propagation

For $l=L-1,L-2…,1$

Categories: