Overfitting can be thought of as fitting the model to noise, while under-fitting is not fitting a model to the signal. In your prediction with overfitting, you'll reproduce the noise, the under-fitting will just generate something close to the mean.
Overfitting: Training: good vs. Test: bad
Under-fitting: Training: bad vs. Test: bad
One will expect that there will be more shrinkage or difference between training and test results for an overfitted model.
Under-fitting - missing parameters that are important with explaining some relationship or making a prediction. Under-fitting can be in the form of choosing an inappropriate specification. For example, a linear model will always under-fit a non-linear relationship.
Training error will decrease as more features are added which is good, but like many things too much of a good thing will have adverse consequences. Validation error should also decline with more features, but there is a limit to this improvement. If validation error starts to increase while training error continues to decline, then there is overfitting.
In the back of your mind, the modeler should always have the trade-off graph between complexity and error. More complexity and the training error goes down, but test error will be higher. For simple models, the training error is higher, but the test error may be lower. The same can be shown in a variance-bias trade-off graph.
No comments:
Post a Comment