Linear regression is one of the most fundamental and widely used statistical techniques in machine learning and data science. It is used to model the relationship between a dependent variable and one or more independent variables. In this article, we will explore the concept of linear regression and provide an example code in Python to demonstrate how it works.

What is Linear Regression?

Linear regression is a statistical technique that aims to find the linear relationship between a dependent variable and one or more independent variables. The dependent variable is also called the response variable, while the independent variables are called the predictors or explanatory variables.

The linear regression model assumes that there is a linear relationship between the response variable and the predictors. This means that the change in the response variable is directly proportional to the change in the predictors. The equation for a simple linear regression model can be represented as:

y = mx + b

where:

Types of Linear Regression

There are two types of linear regression: simple linear regression and multiple linear regression.

Simple Linear Regression

Simple linear regression is used when there is only one predictor variable. The equation for a simple linear regression model can be represented as:

y = mx + b

where:

Multiple Linear Regression

Multiple linear regression is used when there are two or more predictor variables. The equation for a multiple linear regression model can be represented as:

y = b0 + b1x1 + b2x2 + … + bnxn

where:

Linear Regression Example Code in Python

To demonstrate how linear regression works, we will use a dataset of house prices in Boston. This dataset is available in scikit-learn, a popular machine learning library in Python.

First, we need to import the necessary libraries and load the dataset.

from sklearn.datasets import load_boston
import pandas as pd

boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['PRICE'] = boston.target

Next, we will split the dataset into training and testing sets.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df[boston.feature_names], df['PRICE'], test_size=0.2, random_state=0)

Then, we will create a linear regression model using scikit-learn’s LinearRegression class.

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

Finally, we will evaluate the model’s performance on the testing set by calculating the mean squared error (MSE).

from sklearn.metrics import mean_squared_error

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Conclusion

Linear regression is a simple yet powerful statistical technique that can be used to model the relationship between a dependent variable and one or more independent variables. In this article, we provided an overview of linear regression, discussed its types, and provided an example code in Python to demonstrate how it works. We hope this article has been helpful in understanding the basics of linear regression.

Leave a Reply

Your email address will not be published. Required fields are marked *