Categories:

Updated:

# Notations

I will represent scalar as $x$, vector as $\vec{x}$, and matrix as $X$.

## Basic

$m$: the number of training examples

$n$: the number of features

$k$: the number of output classes

$\alpha$: learning rate

$\lambda$: regularization term

$X$: an input matrix, where each row represents each training example, and each column represents each feature. Note that first column $X_1$ is a vector composed only of 1s.

$X^{(i)}$: the row vector of all the feature inputs of the ith training example

$X_j$: the column vector of all the jth feature of training examples.

$X_j^{(i)}$: value of feature j in the ith training example.

## For Linear Regression and Binary Classification

$\vec{y}$: an output column vector, where each row represents each training example.

$\vec{\theta}$: column vector of weights

## For Binary Classification

$\vec{z}$: an output vector of regression step, also an input vector of sigmoid.

## For Multiclass Classification

$Y$: an output matrix, where each row represents each training example, and each column represents each class.

$Z$: an output matrix of regression step, also an input matrix of softmax

$\Theta$: n by k matrix of weights

# Regularized Logistic Regression

## Multiclass Classification

### Hypothesis

where $\sum \vec{a}$ is the sum of all elements of vector a