a easiet way to learn logisitic regression

August 20, 2024

Logistic Regression

Logistic Regression is used for classification problems. It predicts the probability that a given data point belongs to a certain class, like yes/no or 0/1. It uses a logistic function to output a value between 0 and 1. This value is then mapped to a specific class based on a threshold (usually 0.5).

Evaluation Metrics

Accuracy: Accuracy is the ratio of correctly predicted observations to total observations.
Precision and Recall: Precision is the ratio of correctly predicted positive observations to all expected positive observations. Recall is the proportion of correctly predicted positive observations to all observations made in the actual class.
F1 Score: An equilibrium between recall and precision.

Applying with Sci-kit Learn

Breast Cancer dataset, another preloaded dataset in scikit-learn. It’s used for binary classification, making it suitable for Logistic Regression.

Here are the steps we’ll follow to apply logistic regression.

Load the Breast Cancer Dataset: This dataset contains features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass, and the goal is to classify them as benign or malignant.
Split the Dataset: Divide it into training and testing sets.
Create and Train the Logistic Regression Model: Build the model using the training set.
Predict and Evaluate: Use the test set to make predictions and then evaluate the model using Accuracy, Precision, Recall, and F1 Score.

Let’s see the code.

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the Breast Cancer dataset
breast_cancer = load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the Logistic Regression model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Predicting the test set results
y_pred = model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the results
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Here is the output.

Machine Learning Algorithms for Beginner Data Scientists

The high recall indicates that the model is particularly good at identifying malignant cases, which is crucial in medical diagnostics.

Search This Blog

Bharath_Writes