In this section, we will walk through the steps to perform hierarchical clustering in Python.
Step 1: Importing the Required Libraries
We will start by importing the required libraries, including Numpy, Pandas, and Scikit-Learn.
import numpy as np
import pandas as pd
from sklearn.cluster import AgglomerativeClustering
Step 2: Loading the Dataset
In this step, we will load the dataset. For this article, we will use the iris dataset, which is available in the Scikit-Learn library.
from sklearn.datasets import load_iris
iris = load_iris()
Step 3: Preprocessing the Data
In this step, we will preprocess the data by converting it into a Pandas DataFrame and normalizing the features. Normalizing the features is important to ensure that all the features have the same scale, which will prevent any feature from dominating the distance calculations.
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df = (df - df.mean()) / df.std()
Step 4: Creating the Model
In this step, we will create the hierarchical clustering model using the AgglomerativeClustering class from the sklearn.cluster library. We will also specify the number of clusters we want to form, which is 3 in this case.
model = AgglomerativeClustering(n_clusters=3)
model.fit(df)
Step 5: Predicting the Clusters
In this step, we will predict the clusters for each data point using the predict method.
labels = model.labels
Step 6: Evaluating the Clusters
In this step, we will evaluate the clusters by computing the silhouette score, which measures the similarity of each data point to its own cluster compared to other clusters. A silhouette score of 1 indicates that the data points are well separated, while a silhouette score of -1 indicates that the data points are poorly separated.
from sklearn.metrics import silhouette_score
score = silhouette_score(df, labels)
print("Silhouette Score:", score)
Conclusion
Hierarchical clustering is a powerful unsupervised learning technique for grouping data into clusters. It is easy to interpret and does not require us to specify the number of clusters beforehand. However, it can be computationally expensive and is not suitable for high-dimensional data or real-time applications. In this article, we have provided a comprehensive guide to performing hierarchical clustering in Python using the AgglomerativeClustering class from the sklearn.cluster library. We hope that this article has been helpful in understanding the concepts and implementation of hierarchical clustering in Python.
Also check WHAT IS GIT ? It’s Easy If You Do It Smart
You can also visite the Git website (https://git-scm.com/)