I’ve decided to start a small series of knowledge dumps, where I post my notes from study sessions I have in my free time. Below you’ll find my thought process when learning new information, particularly around machine learning. I find myself asking the same questions over and over, so I write my answers down. Let me know if this has helped you as well!
NOTE: Often times I copy + paste directly from other websites, so I’m confident that if you find an entire paragraph on something, it likely came from another source.
Model evaluation procedures
Comparing cross-validation to train/test split
Advantages of cross-validation:
- More accurate estimate of out-of-sample accuracy
- More “efficient” use of data (every observation is used for both training and testing)
Advantages of train/test split:
- Runs K times faster than K-fold cross-validation
- Simpler to examine the detailed results of the testing process
Which evaluation metrics should I use?
Sklearn has a ton of their evaluation and scoring metrics to use, out of the box. Start here.
Model evaluation metrics for regression
Evaluation metrics for classification problems, such as accuracy, are not useful for regression problems. Instead, we need evaluation metrics designed for comparing continuous values.
Three common evaluation metrics for regression problems:
- Mean Absolute Error (MAE) is the mean of the absolute value of the errors.
- Easiest to understand, because it’s the average error.
- Mean Squared Error (MSE) is the mean of the squared errors
- more popular than MAE, because MSE “punishes” larger errors.
- Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors
- even more popular than MSE, because RMSE is interpretable in the “y” (base) units. It is recommended that RMSE be used as the primary metric to interpret your model.
What is recall?
- [precision & recall in 100 seconds]
- # relevant found / relevant
What is R-squared?
- AKA: coefficient of determination
- Fraction of total variation in Y that is captured by the model
- How well does your line follow the variation happening? (dots)
- Ranges from 0 – 1
- 0 None of the variance is captured
- 1 All of it was captured
- A high r-squared simply means your curve fits your training data well; may not be a good predictor
- It’s impossible to calculate R-squared for nonlinear regression
The regression model on the left accounts for 38.0% of the variance while the one on the right accounts for 87.4%. The more variance that is accounted for by the regression model the closer the data points will fall to the fitted regression line. Theoretically, if a model could explain 100% of the variance, the fitted values would always equal the observed values and, therefore, all the data points would fall on the fitted regression line.
[('TV', 0.046564567874150281), ('Radio', 0.17915812245088836), ('Newspaper', 0.0034504647111804347)]
- For a given amount of Radio and Newspaper ad spending, a “unit” increase in TV ad spending is associated with a 0.0466 “unit” increase in Sales.
- Or more clearly: For a given amount of Radio and Newspaper ad spending, **an additional $1,000 spent on TV ads** is associated with an **increase in sales of 46.6 items**.
- This is a statement of association, not causation.
- If an increase in TV ad spending was associated with a **decrease** in sales, $\beta_1$ would be **negative**.
Metrics computed from a confusion matrix
A confusion matrix gives you a more complete picture of how your classifier is performing & allows you to compute various classification metrics, which help guide model selection. Useful for multi-class problems.
Sensitivity: When the actual value is positive, how often is the prediction correct?
- How “sensitive” is the classifier to detecting positive instances?
- AKA: “True positive rate” or “Recall”
Specificity: When the actual value is negative, how often is the prediction correct?
- How “specific” (selective) is the classifier in predicating positive instances?
False Positive Rate: When the actual value is negative, how often is the prediction incorrect?
Precision: When a positive value is predicted, how often is the prediction correct?
- How “precise” is the classifier when predicting positive instances?
- # relevant found / # found
How do you choose which metrics to focus on?
Depends on business objective.
- Spam Filter: Optimize for precision or specificity because false negatives are more acceptable than false positives.
- Fraudulent transactions: Optimize for sensitivity, because false positives are more acceptable than false negatives.
Changing the threshold from the default value of 0.5 can affect sensitivity and specificity. Lowering threshold to increase sensitivity, but lowers specificity.
Metrics to assist with binary classification
What is an RoC curve?
The most commonly used way to visualize the performance of a binary classifier
Can help choose a threshold that balances sensitivity / specificity for your context.
What is AUC?
Perhaps the best way to summarize binary classifier performance in a single number.
The percentage of the ROC plot that is under the curve. Represents the likelihood that your classifier will assign a higher predicted probability to the positive observation.
What does the Metrics function do in Keras?
Why score a model?
- Hyper parameter selection
- Feature selection