I’ve decided to start a small series of knowledge dumps, where I post my notes from study sessions I have in my free time. Below you’ll find my thought process when learning new information, particularly around machine learning. I find myself asking the same questions over and over, so I write my answers down. Let me know if this has helped you as well!

**NOTE: **Often times I copy + paste directly from other websites, so I’m confident that if you find an entire paragraph on something, it likely came from another source.

# Model evaluation procedures

[Video + GitHub] Cross-validation for parameter tuning, model selection, and feature selection

Comparing cross-validation to train/test split

Advantages of **cross-validation:**

- More accurate estimate of out-of-sample accuracy
- More “efficient” use of data (every observation is used for both training and testing)

Advantages of** train/test split:**

- Runs K times faster than K-fold cross-validation
- Simpler to examine the detailed results of the testing process

# Which evaluation metrics should I use?

**Sklearn has a ton of their evaluation and scoring metrics to use, out of the box****. Start here.**

**Model evaluation metrics for regression**

Evaluation metrics for classification problems, **such as accuracy,** are not useful for regression problems. Instead, we need evaluation metrics designed for comparing continuous values.

**Three common evaluation metrics** for regression problems:

**Mean Absolute Erro**r (MAE) is the mean of the absolute value of the errors.- Easiest to understand, because it’s the average error.

**Mean Squared Error**(MSE) is the mean of the squared errors- more popular than MAE, because MSE “punishes” larger errors.

**Root Mean Squared Erro**r (RMSE) is the square root of the mean of the squared errors- even more popular than MSE, because RMSE is interpretable in the “y” (base) units.
**It is recommended that RMSE be used as the primary metric to interpret your model.**

- even more popular than MSE, because RMSE is interpretable in the “y” (base) units.

# Metrics

[How to measure metrics in scikit-learn]

**What is recall?**

**[precision & recall in 100 seconds]**- # relevant found / relevant

**What is R-squared?**

- AKA:
*coefficient of determination* - Fraction of total variation in Y that is captured by the model
*How well does your line follow the variation happening? (dots)*

- Ranges from 0 – 1
- 0 None of the variance is captured
- 1 All of it was captured

- A high r-squared simply means your curve fits your
*training data*well; may not be a good predictor - It’s impossible to calculate R-squared for nonlinear regression

The regression model on the left accounts for 38.0% of the variance while the one on the right accounts for 87.4%. The more variance that is accounted for by the regression model the closer the data points will fall to the fitted regression line. Theoretically, if a model could explain 100% of the variance, the fitted values would always equal the observed values and, therefore, all the data points would fall on the fitted regression line.

**Coefficients**

How do we interpret the TV coefficient (0.0466)?

linreg.intercept_ linreg.coef_

[('TV', 0.046564567874150281), ('Radio', 0.17915812245088836), ('Newspaper', 0.0034504647111804347)]

- For a given amount of Radio and Newspaper ad spending,
**a “unit” increase in TV ad spending**is associated with a 0.0466**“unit” increase in Sales.**

- Or more clearly: For a given amount of Radio and Newspaper ad spending, **an additional $1,000 spent on TV ads** is associated with an **increase in sales of 46.6 items**.

**Important notes:**

- This is a statement of
**association**, not**causation.** - If an increase in TV ad spending was associated with a **decrease** in sales, $\beta_1$ would be **negative**.

# Metrics computed from a confusion matrix

**[How to evaluate a classifier in scikit-learn]**

A **confusion matrix** gives you a more complete picture of how your classifier is performing & allows you to compute various classification metrics, which help guide model selection. Useful for **multi-class problems.**

**Sensitivity: **When the actual value is positive, how often is the prediction correct?

- How “
*sensitive”*is the classifier to detecting positive instances? - AKA: “
*True positive rate”*or “*Recall”*

**Specificity:** When the actual value is negative, how often is the prediction correct?

- How “
*specific” (selective)*is the classifier in predicating positive instances?

**False Positive Rate: **When the actual value is *negative*, how often is the prediction incorrect?

**Precision:** When a positive value is predicted, how often is the prediction correct?

- How “
*precise”*is the classifier when predicting positive instances? - # relevant found / # found

**How do you choose which metrics to focus on?**

Depends on business objective.

**Spam Filter:**Optimize for**precision or specificity**because false negatives are more acceptable than false positives.**Fraudulent transactions:**Optimize for**sensitivity**, because false positives are more acceptable than false negatives.

Changing the threshold from the default value of 0.5 can affect **sensitivity** and **specificity**. Lowering threshold to **increase sensitivity**, but **lowers specificity.**

# Metrics to assist with binary classification

**What is an RoC curve?**

The most commonly used way to visualize the performance of a binary classifier

Can help **choose a threshold** that balances sensitivity / specificity for your context.

**What is AUC?**

[The Area under the RoC curve]

Perhaps the best way to summarize binary classifier performance in a single number.

The **percentage** of the ROC plot that is under the curve. Represents the likelihood that your classifier will assign a higher predicted probability to the positive observation.

# What does the Metrics function do in Keras?

Jason Brownlee – How to Use Metrics for Deep Learning with Keras in Python

# Why score a model?

- Hyper parameter selection
- Feature selection

@DaveVoyles