Overfitting: capturing spruious patterns that won't recur in the future, leading to less accurate predictions.
Underfitting: failing to capture relevant patterns, leading to less accurate predictions.
Examples with the decision tree model
- The most important options determine the tree's depth.
- As the tree gets deeper, the dataset gets sliced up leaves with fewer datas (2 -> 4 -> 8 -> 16 ...)
- → Leaves with very few datas will make predictions that are quite close to those data's actual values or results, but they may make very unreliable predictions for new data.
≫ Overfitting : a model matches the training data almost perfectly, but does poorly in validation and other new data.
- If a tree divides datas into only 2 or 4, each group still has a wide variety of datas.
- → Resulting predictions may be far off for most datas, even in the training data.
≫ Underfitting: a model fails to capture important distinctions and patterns in the data, so it performs poorly even in training data.
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor
def get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y):
model = DecisionTreeRegressor(max_leaf_nodes = max_leaf_nodes, random_state = 0)
model.fit(train_X, train_y)
preds_val = model.predict(val_X)
mae = mean_absolute_error(val_y, preds_val)
return(mae)
for max_leaf_nodes in [5, 50, 500, 5000]:
my_mae = get_mae[max_leaf_nodes, train_X, val_X, train_y, val_y)
print("Max leaf nodes: %d \t\t Mean absolute error: %d" %(max_leaf_nodes, my_mae))
# Max leaf nodes: 5 Mean absolute error: 347380
# Max leaf nodes: 50 Mean absolute error: 258171
# Max leaf nodes: 500 Mean absolute error: 243495
# Max leaf nodes: 5000 Mean absolute error: 254983
Models can suffer from either
Overfitting
: capturing spurious patterns that won't recur in the future, leading to less accurate predictions.
Underfitting
: failing to capture relevant patterns, again leading to less accurate predictions.
'Kaggle' 카테고리의 다른 글
<Kaggle > Learn- Intro to Machine Learning (Random Forest) (0) | 2024.11.09 |
---|---|
<Kaggle > Learn- Intro to Machine Learning (Model Validation) (2) | 2024.11.05 |