Kickstart Your Machine Learning Journey: A Beginner's Guide to Scikit-Learn
Learn to Build Your First Machine Learning Model with Scikit-Learn in Python
Pre-requisite:
Python (version 3.6 or later)
Scikit-learn
NumPy
Pandas
Matplotlib (optional, for visualization)
Scikit-Learn:
Machine learning is transforming industries by enabling data-driven decision-making. Scikit-learn, a powerful open-source Python library, provides a range of simple yet efficient tools for data mining and data analysis. In this guide, we'll walk through the basics of machine learning and how to get started with Scikit-learn.
Installation:
pip install scikit-learn
Include any necessary dependencies, such as NumPy and Pandas, if not already installed.
Basic Concepts of Machine Learning:
Before diving into code, let's talk about few concepts -
Supervised Learning: Involves training a model on labeled data, where the input comes with corresponding output labels. The model learns to predict outcomes based on this data. Examples include:
Regression: Predicting continuous values, such as house prices.
Classification: Categorizing data into classes, such as identifying spam emails.
Unsupervised Learning: Unsupervised learning deals with unlabeled data, where the goal is to uncover hidden patterns or structures. Key techniques include:
Clustering: Grouping similar data points together, like customer segmentation.
Dimensionality Reduction: Reducing the number of features while retaining important information, such as Principal Component Analysis (PCA).
Model Training and Testing: The process involves:
Training: Using a portion of the data (training set) to teach the model how to make predictions.
Testing: Evaluating the model’s performance on a separate portion of the data (testing set) to assess its accuracy and generalizability.
Hands-on example of building a machine learning model!
Here, we will train a model to predict house prices based on the given area as input.
Import Libraries
Start by importing the necessary libraries:
import numpy as np import pandas as pd import matplotlib.pyplot as plt #Visualizing data from sklearn.linear_model import LinearRegression #Linear Regression model
Loading the dataset
Download the dataset from here
#Loading dataset into a pandas DataFrame df = pd.read_csv('homeprices.csv')
Plotting the dataset
%matplotlib inline #magic function to display graph inline plt.xlabel("Area") plt.ylabel("Price") #On x-axis = area, y-axis = price plt.scatter(df.area,df.price,color='red',marker='+')
You'll see a plot like this. It's clear that a straight line can be fitted, which is why we use Linear Regression.
Training the model
#Creating an object of model class reg = LinearRegression() #Training the model with the dataset reg.fit(df[['area']],df.price)
Predicting the price
reg.predict([[3300]]) #This is the price predicted by our model based on area = 3300 #array([628715.75342466])
Calculating score
Score is essentially a measure of how well our model is performing.
reg.score(df.area,df.price)
So, you trained your first model! 🎉
Conclusion
Congratulations! You've just built and evaluated a basic machine learning model using Scikit-learn! This is just the beginning—Scikit-learn offers so many more features and techniques for solving all kinds of machine learning problems. As you get more comfortable, you can dive into more complex models, fine-tune hyperparameters, and experiment with different types of data.
Happy learning! 🎉!
For further explanation, watch: