🌸 Clustering Iris Species with K-Means and Hierarchical Methods


Why Clustering?

Clustering is a foundational technique in unsupervised learning, used to uncover patterns in data without predefined labels. This project explores how two popular clustering algorithms—K-Means and Agglomerative Hierarchical Clustering—perform on the well-known Iris flower dataset, aiming to group samples by species based solely on their morphological features.


About the Dataset

The Iris dataset contains 150 samples from three species: setosa, versicolor, and virginica. Each sample includes four features:

While setosa is linearly separable, versicolor and virginica overlap significantly, making this dataset ideal for testing clustering algorithms.


What Was Explored

The analysis focused on:


Key Experiments & Findings

Dimensionality Reduction with PCA

Optimal Cluster Count

Parameter Tuning for Agglomerative Clustering

Performance Comparison

AlgorithmAccuracySilhouette ScoreARI
K-Means83.3%0.460.62
Agglomerative (default)82.7%0.450.61
Agglomerative (best)88.7%0.450.72

Error Analysis


Final Thoughts

While Agglomerative Clustering achieved the highest accuracy with tuned parameters, its sensitivity to configuration and instability in cluster composition make it less reliable for real-world applications without labeled data.

K-Means, despite slightly lower accuracy, offered more balanced results and greater stability, making it a safer choice for practical clustering tasks.


Future Work


The full notebook with code and visualizations is embedded below 👇

You can also view the notebook in a separate page, or check it on GitHub.