K-Nearest Neighbors (KNN)

August 29, 2024

An easy-to-understand approach for regression and classification is K-Nearest Neighbors (KNN). A data point is classed according to the classification of its neighbors.

KNN looks at the ‘K’ closest points (neighbors) to a data point and classifies it based on the majority class of these neighbors. For regression, it takes the average of the ‘K’ nearest points.

Evaluation Metrics

Classification: Accuracy, Precision, Recall, F1 Score.
Regression: Mean Squared Error (MSE), R-squared.

Applying with Sci-kit Learn

We’ll use the Wine dataset again but this time with KNN. We’ll train the KNN model to classify the types of wine and evaluate its performance with classification metrics. Here are the steps we’ll follow.

1. Create and Train the KNN Model:

A K-Nearest Neighbors (KNN) model is created with n_neighbors=3. This means the model looks at the three nearest neighbors of a data point to make a prediction.
The model is trained (fitted) with the training data. During training, it doesn’t build a traditional model but memorizes the dataset.

2. Predict:

The trained KNN model is then used to predict the class labels (types of wine) of the test data. The model determines the most common class among these neighbors for each point in the test set by examining the three nearest points in the training set.

3. Evaluate:

The model’s predictions are evaluated against the actual labels of the test set.

Here is the code.

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the KNN model
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)

# Predicting the test set results
y_pred_knn = knn_model.predict(X_test)

# Evaluating the model
accuracy_knn = accuracy_score(y_test, y_pred_knn)
precision_knn = precision_score(y_test, y_pred_knn, average='macro')
recall_knn = recall_score(y_test, y_pred_knn, average='macro')
f1_knn = f1_score(y_test, y_pred_knn, average='macro')

# Print the results
print("Accuracy:", accuracy_knn)
print("Precision:", precision_knn)
print("Recall:", recall_knn)
print("F1 Score:", f1_knn)

Here is the output.

These results indicate that the KNN model performs exceptionally well on this dataset. The high scores across all metrics show that the model is not only accurate overall but also maintains a good balance between precision and recall, effectively classifying the wine types.

Search This Blog

Bharath_Writes