Most used Machine Learning Algorithms for Beginner Data Scientists

 Let’s explore the machine learning algorithms perfect for beginners in data science. We’ll explain each one and show you how to use them effectively.

MUST-KNOW ML

ALGORITHMS FOR STARTERS

Machine Learning has become an important tool in the data scientist toolkit and has become a famous concept, after seeing fancy applications in the last decade.

To effectively harness the power of machine learning, it’s crucial to understand both the underlying concepts and their practical applications.

In this article, we will explore the top 10 machine learning algorithms that are particularly well-suited for those starting their journey in data science, and how to apply them. Let’s start!

1. Linear Regression

Machine Learning Algorithms for Beginner Data Scientists
Image by author

Linear Regression predicts a continuous output by establishing a linear relationship between input variables and the output. Imagine drawing a straight line through a set of points on a graph.

It decides by finding the line that best fits the data points. This line is determined by minimizing the difference (error) between the actual values and the predicted values from the line.

Evaluation Metrics

Mean Squared Error (MSE): Measures the average of the squares of the errors. Lower values are better.

R-squared: Represents the percentage of the dependent variable’s variation that can be predicted based on the independent variables. Closer to 1 is better.

Applying with Sci-kit Learn

Since we’re discussing Linear Regression first, we’ll use the Diabetes dataset, a preloaded dataset in scikit-learn, ideal for regression tasks.

Here are the steps we’ll follow in the code blocks below;

  1. Get the Diabetes Dataset loaded: Ten baseline variables, including age, sex, BMI, average blood pressure, and six blood serum measures for diabetic patients, are included in this dataset.
  2. Split the Dataset: Divide it into training and testing sets.
  3. Create and Train the Linear Regression Model: Build the model using the training set.
  4. Predict and Evaluate: Use the test set to make predictions and then evaluate the model using MSE and R-squared.

Now let’s start!

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the Diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predicting the test set results
y_pred = model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("MSE is:", mse)
print("R2 score is:", r2)

Here is the output.

Machine Learning Algorithms for Beginner Data Scientists

These results indicate that our Linear Regression model explains about 45% of the variance in the diabetes dataset. The MSE tells us that, on average, our predictions are about 2900 units away from the true values.

Comments

Popular posts from this blog

Mastering the Difference Array Technique: A Simple Guide with Examples

Leetcode 2594. Minimum Time to Repair Cars explained clearly

Finding a Unique Binary String in Python