We have a training set of 2,000 digits and a test set of 1,000 digits.
The digits are 28x28 pixels in size giving us 784 variables.
We have a training set of 2,000 digits and a test set of 1,000 digits.
The digits are 28x28 pixels in size giving us 784 variables.
There is massive correlation in the variables, but the algorithms presented here will tend to ignore this.
The \(k\)-nearest neighbours algorithm is a lazy algorithm. This means that it does not learn a model from the training data. Instead, it classifies new data points based on the training data.
The \(k\)-nearest neighbours algorithm has two main choices:
library(class)
# Train the model
knn_model <- knn(train = MNIST_train[, -1],
test = MNIST_test[, -1],
cl = MNIST_train$label,
k = 3)
# Compute the accuracy
sum(knn_model == MNIST_test$label) / length(knn_model)
## [1] 0.868
## k-Nearest Neighbors
##
## 2000 samples
## 784 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (20 fold)
## Summary of sample sizes: 1900, 1900, 1902, 1898, 1900, 1901, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.9124143 0.9025703
## 2 0.8979975 0.8865268
## 3 0.9075030 0.8971012
## 4 0.9020126 0.8909903
## 5 0.9009925 0.8898518
## 6 0.8959968 0.8842878
## 7 0.8930119 0.8809638
## 8 0.8939970 0.8820677
## 9 0.8914812 0.8792735
## 10 0.8940216 0.8820880
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.
## k-Nearest Neighbors
##
## 202 samples
## 2 predictor
## 4 classes: '1', '2', '3', '4'
##
## No pre-processing
## Resampling: Cross-Validated (20 fold)
## Summary of sample sizes: 191, 191, 193, 193, 191, 193, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.6426894 0.5198390
## 2 0.6090530 0.4715523
## 3 0.6311237 0.5020650
## 4 0.5773232 0.4303253
## 5 0.5945076 0.4547095
## 6 0.5940530 0.4522104
## 7 0.5987753 0.4593627
## 8 0.6131944 0.4783842
## 9 0.6088258 0.4702154
## 10 0.6002399 0.4614984
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.
In the absence of any information, what proportion of the data belongs to each class?
If it did belong to class A, how probable it is that the explanatory variables would look like this?
See notes
The Naive Bayes classifier is an eager algorithm. This means that it learns a model from the training data. It is also probabilistic in nature, but most implementations will return a class label alone.
The Naive Bayes classifier has one main choice:
\(x_1\) | \(x_2\) | \(y\) |
---|---|---|
1.5 | A | Positive |
1.7 | B | Positive |
1.3 | A | Positive |
1.9 | B | Positive |
2.1 | A | Positive |
2.3 | B | Negative |
2.5 | A | Negative |
2.7 | B | Negative |
1.9 | A | Negative |
We have estimated class prior probabilities of \(P(\text{Positive}) = 5/9\) and \(P(\text{Negative}) = 4/9\).
We have estimated class prior probabilities of \(P(\text{Positive}) = 5/9\) and \(P(\text{Negative}) = 4/9\).
We have also estimated the likelihoods of the variables as follows: \[ \begin{aligned} x_1 | \text{Positive}~ &\sim N(1.7, 0.1),\\ x_1 | \text{Negative} &\sim N(2.35, 0.1),\\ x_2 | \text{Positive}~ &\sim \text{Multinomial}(P_A = 0.6, P_B = 0.4),\\ x_2 | \text{Negative} &\sim \text{Multinomial}(P_A = 0.5, P_B = 0.5).\\ \end{aligned} \]
We can now estimate the new data points falling into the positive class.
\(x_1\) | \(x_2\) | \(P(\text{Positive}|x)\) |
---|---|---|
1.8 | A | 0.87 |
2.2 | B | 0.24 |
2.0 | B | 0.54 |
library(e1071)
# Train the model
nb_model <- naiveBayes(x = MNIST_train[, -1],
y = as.factor(MNIST_train$label))
# Summarise
nb_model$apriori
## as.factor(MNIST_train$label)
## 0 1 2 3 4 5 6 7 8 9
## 191 220 198 191 214 180 200 224 172 210
## [1] 0.498
## nb_predict
## 0 1 2 3 4 5 6 7 8 9
## 0 74 1 0 0 0 1 2 0 7 0
## 1 0 125 0 0 0 0 1 0 0 0
## 2 6 22 20 5 0 4 18 0 41 0
## 3 9 20 1 26 0 5 7 2 32 5
## 4 5 10 0 0 18 0 12 1 26 38
## 5 10 13 1 0 2 1 4 0 54 2
## 6 4 5 1 0 0 0 72 0 5 0
## 7 2 24 0 1 2 1 1 29 13 26
## 8 1 19 1 1 2 2 0 0 60 3
## 9 1 13 0 0 1 0 1 0 5 73
## X$Class
## 1 2 3 4
## 50 50 50 52