Regularized Linear Regression and Logistic Regression Cheatsheet

Notations

I will represent scalar as $x$, vector as $\vec{x}$, and matrix as $X$.

Basic

$m$: the number of training examples

$n$: the number of features

$k$: the number of output classes

$\alpha$: learning rate

$\lambda$: regularization term

$X$: an input matrix, where each row represents each training example, and each column represents each feature. Note that first column $X_1$ is a vector composed only of 1s.

$X^{(i)}$: the row vector of all the feature inputs of the ith training example

$X_j$: the column vector of all the jth feature of training examples.

$X_j^{(i)}$: value of feature j in the ith training example.

For Linear Regression and Binary Classification

$\vec{y}$: an output column vector, where each row represents each training example.

$\vec{\theta}$: column vector of weights

For Binary Classification

$\vec{z}$: an output vector of regression step, also an input vector of sigmoid.

For Multiclass Classification

$Y$: an output matrix, where each row represents each training example, and each column represents each class.

$Z$: an output matrix of regression step, also an input matrix of softmax

$\Theta$: n by k matrix of weights

Regularized Linear Regression

Hypothesis

Cost

Gradient Descent

Regularized Logistic Regression

Binary Classification

Hypothesis

Cost

Gradient Descent

Multiclass Classification

Hypothesis

where $\sum \vec{a}$ is the sum of all elements of vector a

Cost

Gradient Descent

Leave a Comment