# Machine Learning: Cost Function

## Machine Learning - The Supervised kind

Machine learning is the ability of computer algorithms to improve continuously through experience. One of the most common types of machine learning techniques include supervised learning. In learning algorithms, we have two sets of values - the ** Input Features** and the

**. Most of the supervised learning algorithms are classified into two types of problems:**

*Output Variables***Regression**: In regression problems, we have a set of continuous input features, mapped against the output variables. The problem is to predict a real-valued output against an anonyomous input feature, as close to the actual value as possible.**Classification**: In classification problems, we have a set of inputs belonging to a given category. The problem is to map anonymous input values into discreete categories.

In this article, we shall discuss about the **Regression** type problems in machine learning, the definition of **cost function** and the need for minimizing it.

## What is Regression? What scenarios require regression method of problem solving in machine learning?

Regression is a mathematical problem solving method in which a we try to formulate a function through which an unknown variable can be predicted, whose value depends upon the values of known variables.

Assume that we have the problem of predicting the monetary value of house based on three factors: **Dimensions**, **Number of bedrooms**, and **Age of the house**. One can say that, the value of the house increases if the **Dimensions** and **Number of bedrooms** increases. On the otherhand, the value of house decreases if the **Age** increases

In this problem,

**Unknown/Dependent variable**(**y**): Value of the house**Known/Independent variable/s**(x_{1}, x_{2}, x_{3}): Dimension, Number of bedrooms, Age**Regression function example**:**f(x)**=**θ**+_{1}x_{1}**θ**+_{2}x_{2}**θ**_{2}x_{2}

Where, **x _{1}**,

**x**, and

_{2}**x**represent Dimension, No.of bedrooms and Age respectively.

_{3}**y**is the correct value of the house. The co-efficients

**θ**,

_{1}**θ**, and

_{2}**θ**of the independent variables

_{3}**x**,

_{1}**x**, and

_{2}**x**will change according to the training samples given while formulating the regression function

_{3}**f(x)**.

If we have only one independent variable(**x**), we call this as **linear regression**. For the sake of simplicity, I will be using linear regression in the rest of the article.

## Cost of the hypothesis function

Consider the linear regression problem containing only one independent variable. Let's define the linear regression function by:

**f(x)** = **θ _{0}** +

**θ**

_{1}xThis function is a hypothesis function in which we say that, for certain value of **θ _{0}** and

**θ**, given the value of

_{1}**x**, we get the predictions of

**f(x)**very close to the actual value.

Let us consider two graphs. **Graph 1 contains the plot of output y versus the input values x as dots on a graph**.

**Graph 2 contains the plot of hypothesis function**. If we merge these two graphs and represent them in a single graph, the distance between the

*f(x)*plotted on the graph as a straight line**predicted value**(a point on the hypothesis function) and the

**actual value**(training sample value) represents the cost of individual predictions.

**Let us view this graph:**

**In the above graph:**

- Individual points on the graph represent each training sample
**y**v/s**x**. - The line on the graph represents the hypothesis function
**f(x)** - The distance between the points and the line represents the
**cost**for the individual training sample

**Cost representation**: We have seen the representation of cost graphically. Let us try to derrive a mathematical equation out of this graphical representation. We define the following parameters used in the cost funciton.

- J(
**θ**) => Individual Cost function._{i} - f(
**x**) => Hypothesis function for_{i}**i**training set._{th} **y**=>_{i}**i**value in training set._{th}

The individual costs can defined as the difference between the value of hypothesis function and the actual value:
J(**θ _{1}**) = f(

**x**) -

_{1}**y**, J(

_{1}**θ**) = f(

_{2}**x**) -

_{2}**y**

_{2}The total cost of all the values present in the training set can be represented as: **Total cost** = **Σ _{0-i}** (f(

**x**) -

_{i}**y**)

_{i}## Minimizing cost function: We don't wana spend too much now, Do we?

The final goal of linear regression is to find the hypothesis function using training samples, such that the final total cost of the hypothesis function is minimal. Let us derrive the equation that represents cost function to be minimized.

- We know that the total cost of the hypothesis function, given a training set can be defined as: Total cost =
**Σ**(f(_{0-i}**x**) -_{i}**y**)_{i} - We want the cost to be minimum, in other words, the difference between (f(
**x**) and_{i}**y**) should be minimum. Note that when we square an integer, its value increases, however, if we square a fraction, its value decreases. Squaring the the difference will make sure that the cost is at its absolute minimum: Total cost = J(_{i}**θ**) =_{i}**Σ**(f(_{0-i}**x**) -_{i}**y**)_{i}^{2} - Let us consider that the training sample has
`m`

number of values in it. The average cost can be represented as: (**Σ**(f(_{0-i}**x**) -_{i}**y**)_{i}^{2})/**m**

**Hence the average cost function to be minimized can be represented as:**
J(**θ**) = (**Σ _{0-i}** (f(

**x**) -

_{i}**y**)

_{i}^{2})/

**m**

If we substitute the hypothesis function with the actual values of **θ** and **x**, we get the cost function as:

J(**θ**) = (**Σ _{0-i}** ((

**θ**+

_{0}**θ**) -

_{1}x_{i}**y**)

_{i}^{2})/

**m**

There are many algorithms that can be implemented to minimize this cost function. **Gradient descent** is one such algorithm commonly used, however, note that there are more than one ways to reduce the cost function. I hope this article gave an insight on understanding how the average cost function is derrived from the hypothesis function in linear regression.

Written by **Aparna Joshi** who works as a software engineer in Bangalore. Aparna is also a technology enthusiast, writer, and artist. She has an immense passion and curiosity towards psychology and its implications on human behavior. Her links: **Blog,** **Twitter,** **Email,** **Newsletter**