A Beginner's Guide to Probabilistic Classifiers

August 22, 2024

“Naive Bayes classifiers” are a family of simple “probabilistic classifiers” that use the Bayes theorem and strong (naive) independence assumptions between the features. It’s particularly used in text classification.

It calculates the probability of each class and the conditional probability of each class given each input value. These probabilities are then used to classify a new value based on the highest probability.

Evaluation Metrics:

Accuracy: Measures overall correctness of the model.
Precision, Recall, and F1 Score: Especially important in cases where class distribution is imbalanced.

Applying with Sci-kit Learn

We’ll use the Digits dataset, which involves classifying images of handwritten digits (0–9). This is a multi-class classification problem. We’ll train the Naive Bayes model, predict digit classes, and evaluate using classification metrics. Here are the steps we’ll follow.

Load the Digits Dataset:

The Digits dataset consists of 8x8 pixel images of handwritten digits (from 0 to 9). Each image is represented as a feature vector of 64 values (8x8 pixels), each representing the grayscale intensity of a pixel.

2. Split the Dataset:

Similar to previous examples, the dataset is divided into training and testing sets. We use 80% of the data for training and 20% for testing. This helps in training the model on a large portion of the data and then evaluating its performance on a separate set that it hasn’t seen before.

3. Create and Train the Naive Bayes Model:

A Gaussian Naive Bayes classifier is created. This variant of Naive Bayes assumes that the continuous values associated with each feature are distributed according to a Gaussian (normal) distribution.
The model is then trained (fitted) on the training data. It learns to associate the input features (pixel values) with the target values (digit classes).

4. Predict and Evaluate:

After training, the model is used to predict the class labels of the test data.

Here is the code below.

from sklearn.datasets import load_digits
from sklearn.naive_bayes import GaussianNB

# Load the Digits dataset
digits = load_digits()
X, y = digits.data, digits.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the Naive Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# Predicting the test set results
y_pred = model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')

# Print the results
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Here is the output.

Machine Learning Algorithms for Beginner Data Scientists

These results show that the Naive Bayes model has a good performance on this dataset, with fairly balanced precision and recall. The model is quite effective in classifying handwritten digits, though there’s room for improvement, especially in terms of accuracy and F1 score.

Search This Blog

Bharath_Writes