Multivariate Linear Regression

Multivariate linear regression is a linear regression with multiple variables.


I will represent scalar as $x$, vector as $\vec{x}$, and matrix as $X$.


$m$: the number of training examples

$n$: the number of features

$k$: the number of output classes

$\alpha$: learning rate

$X$: an input matrix, where each row represents each training example, and each column represents each feature. Note that first column $X_1$ is a vector composed only of 1s.

$X^{(i)}$: the row vector of all the feature inputs of the ith training example

$X_j$: the column vector of all the jth feature of training examples.

$X_j^{(i)}$: value of feature j in the ith training example.

$\vec{y}$: an output column vector, where each row represents each training example.

$\vec{\theta}$: column vector of weights


First we will assume there is only one training example $x$.

Below equation is the same as the above.

Vectorized Version

We have implemented cost function for every training examples. By ‘vectorizing’ this implementation, we can deal with those training examples at the same time. It saves tremendous amount of time and resource when it comes to computing with computer.

Think of vectorizing as stacking training examples from top to bottom of the input. So the training examples are stored row-wise in matrix X. Thetas are also stacked vertically in a vector.

Then the generalized hypothesis is as follows.

Cost Function

Vectorized Version

where $\vec{y}$ denotes the vector of all y values.

Gradient Descent

Vectorized Version

Vectorized version of gradient descent is basically this.


Like we’ve seen in univariate linear regression, we calculate derivative of each theta as follows.

We can vectorize above equation.


In conclusion, vectorized gradient descent rule is:

Leave a Comment