Forward and Backward Propagation in Binary Logistic Regression

We use forward propagation to get our prediction and cost, and use backward propagation(‘backprop’) to get our derivatives for gradient descent.

Notations

$X$: training examples stacked top to bottom ($R^{m\times n}$)

$x^{(i)}, a^{(i)}, z^{(i)}$: row vector or a scalar corresponding to example $i$

$w$: weight vector ($R^{n\times 1}$)

$b$: bias ($R$)

$z$: output of linear transformation ($R^{m\times 1}$)

$a$: prediction ($R^{m\times 1}$)

$J$: cost ($R$)

Forward Propagation

Backward Propagation

This is our gradient descent.

We calculate $dw, db$ with back propagation.

When $m=1$

We will look at backprop of a case where there is only one example and generalize it to cases where $m\geq1$.

When $m=1$, note that we call our cost ‘loss’

Backprop First Step

Backprop Second Step

Backprop Third Step

Note that $x\in R^{1\times n}$ and $a-y$ is a scalar. So the resulting $\frac{dL}{dw} \in R^{1\times n}$.

Generalized BackProp

Our cost is a mean of loss of all examples.

Now that we know $\frac{dL}{dw} = x(a-y)$ and $\frac{dL}{db} = a-y$, we can calculate generalized version of backprop.

Leave a Comment