Overfitting is where a model is trained to be over-optimized just to the training data. In this case, the trained model may display high accuracy with training data set, but have problems successfully predicting data that were not part of the training data set.

The following are examples of overfitting.

Assuming that we have a sample data set like the graph on the right, we will make a prediction through linear regression

Here is a graph that shows a prediction made through a straight line using a linear function.

Here is a graph that shows a prediction made through a straight line using a linear function.

Increasing the graph's degree can produce more accurate predictions. However, if the degree is increased too much, it may become overly optimized for the training data set and lose generality.

Increasing the graph's degree can produce more accurate predictions. However, if the degree is increased too much, it may become overly optimized for the training data set and lose generality.

There are various solutions available that could be used to prevent overfitting.**- Regularization****- Drop Out****- Data Augmentation****3. 1 Regularization**

As the weight of certain variables increases in a model, the likelihood of overfitting generally also increases. To prevent certain variables from becoming too heavy in a model, the weight of the variable could be given a limit. And the method of regulating the cost function is called **model regularization.** Model regularization method could be divided into **L1** and **L2 regularization**.

The following is the formula for **L2 Regularization**.

In L2 normalization, we add the term of cost function and the sum of squares of the weights multiplied by parameter ** **representing the strength of the regulation. Here, the parameter ** has a value between
0 and 1, which can be adjusted to control the strength of the regulation on the model. If **** **becomes closer to 1, regulation becomes more strict and the graph of the model becomes closer to a linear line.

The following is the formula for **L1 Regularization**.

L1 regularization prevents overfitting by adding the sum of the absolute values of the weights after the cost function. As in L2 regularization, the larger the **, stronger regulation on the model. What is different from L2 regularization
which minimize non-critical variable weight is that, L1 regularization ****weight non-critical variables to equal zero**. So L1 regularizations are used when you only want to leave the important variables and mute all others, whereas L2
regularizations are used to consider overall features

**3.2 Drop out**

Dropout is a technique that randomly excludes nodes as the model processes neural network training. Each training, the model randomly selects the nodes to ignore and trains without the selected nodes. Since the excluded nodes are randomly changed every time, the model is trained with its own neural network in each sessions to effectively prevent overfitting.

**3.3 Data Augmentation**

Data Augmentation is a method where the number of data is increased by adding new data that is slightly transformed from the existing data. This method is used prevent overfitting that result from small number of existing training data set. For example, you can rotate existing image data or change the contrast to create similar but different images. Since the generated images are recognized as new data that are different from the existing ones, the number of training data can be artificially increased, effectively preventing overfitting.