Loading ad...

Unsupervised Learning

So far in this course, we’ve focused on supervised learning, where the model learns from labeled data. Think of predicting house prices or classifying spam emails, you always had examples with the correct answers (prices or labels).

But what happens when you don’t have labels? Imagine you’re handed a dataset of thousands of customer transactions with no categories, or millions of medical scans with no diagnosis labels. You still want to find patterns, groups, anomalies, or ways to simplify the data.

That’s where unsupervised learning shines.

Years ago, I worked with a retail company that wanted to segment customers. The problem? They had no pre-labeled groups like “budget shoppers” or “luxury shoppers.” Using clustering, we discovered natural groups in their data. This insight allowed marketing teams to design tailored campaigns, which doubled customer engagement.

In this lesson, we’ll explore three key unsupervised techniques:

  • Clustering (K-Means, Hierarchical)
  • Dimensionality Reduction (PCA)
  • Anomaly Detection basics

K-Means and Hierarchical

K-Means begins by placing cluster centers randomly. Each data point is assigned to the nearest center, then the centers are recalculated as the average of their assigned points. This process of assignment and updating repeats until the centers no longer move, forming stable clusters.

K-Means Clustering (Customer Segmentation)

K-Means Clustering

Idea: Split data into k groups (clusters), where each group has a center (mean). Points are assigned to the nearest center, and the centers keep updating until stable.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Sample dataset: customers [annual income, spending score]
X = np.array([
    [15, 39], [16, 81], [17, 6], [18, 77], [19, 40], 
    [20, 76], [21, 6], [22, 94], [23, 40], [24, 73]
])

# Apply K-Means with 2 clusters
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_

# Plot clusters
plt.scatter(X[:,0], X[:,1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], c='red', marker='x')
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Segmentation with K-Means")
plt.show()

Explanation with impact

  • KMeans(n_clusters=2) → tells the algorithm we want 2 groups. Choosing k is important — too few and you miss details, too many and it gets messy.
  • fit(X) → algorithm finds cluster centers by repeatedly adjusting them.
  • labels_ → gives the cluster assignment for each customer.
  • Plotting shows two clear groups, one group spends less, the other spends more.

Real-world note: A bank I worked with used K-Means on customer spending patterns. They discovered a hidden segment of customers who rarely took loans but frequently used credit cards — a new opportunity for targeted products.

Hierarchical Clustering

Hierarchical clustering starts with each point as its own cluster. It then repeatedly merges the two closest clusters based on distance until only one large cluster remains. The dendrogram visualizes this process, and cutting it at a certain height reveals the natural grouping structure.

Hierarchical Clustering Dendrogram

Here is the coding example:

python
1
2
3
4
5
6
7
8
9
10
11
12
from scipy.cluster.hierarchy import dendrogram, linkage

# Perform hierarchical clustering
Z = linkage(X, method='ward')

# Plot dendrogram
plt.figure(figsize=(6, 4))
dendrogram(Z)
plt.title("Hierarchical Clustering Dendrogram")
plt.xlabel("Customers")
plt.ylabel("Distance")
plt.show()

Explanation with impact

  • linkage(X, method='ward') → computes cluster distances using Ward’s method (minimizes variance).
  • dendrogram(Z) → visual tree that shows how clusters merge.
  • You can choose a “cut” point on the tree to decide the number of clusters.

Experience: In healthcare data, hierarchical clustering was invaluable because we didn’t know the number of patient subgroups beforehand. The dendrogram revealed natural splits we wouldn’t have guessed.

Dimensionality Reduction: PCA

Principal Component Analysis (PCA) finds the directions in data where variance is highest. It then rotates the dataset onto these directions (principal components) and keeps only the most important ones. This reduces the number of dimensions while retaining most of the original information.

Datasets often have many features (dozens or hundreds). High dimensions make it:

  • Hard to visualize.
  • Slower to process.
  • Risky for overfitting.
PCA Dimensionality Reduction

Dimensionality Reduction simplifies data while preserving the most important patterns.

PCA (Principal Component Analysis)

PCA rotates data into new “principal components” that capture the most variance.

python
1
2
3
4
5
6
7
8
from sklearn.decomposition import PCA

# Apply PCA to reduce 2D data to 1D
pca = PCA(n_components=1)
X_reduced = pca.fit_transform(X)

print("Original shape:", X.shape)
print("Reduced shape:", X_reduced.shape)

Explanation with impact

  • PCA(n_components=1) → tells PCA to keep only 1 dimension.
  • fit_transform(X) → finds the direction with maximum variance and projects data onto it.
  • Result: Data is now 1D but still preserves the main structure.

Note: In a face recognition project, raw images had thousands of pixels. PCA reduced them to fewer components, speeding up training and still keeping enough detail to distinguish faces.

Anomaly Detection Basics

Anomaly detection compares each value to the dataset’s mean using the Z-score, which measures how many standard deviations away a point lies. Values far from the mean (e.g., 2 or 3 standard deviations) are considered unusual. These outliers often represent fraud, defects, or rare events. Anomalies are “outliers”, data points that don’t fit normal patterns. Think of fraud detection: most transactions are normal, but rare fraudulent ones stick out.

Anomaly Detection in Transactions

Example with Z-Score

We can flag anomalies by checking how far a point is from the mean (measured in standard deviations).

python
1
2
3
4
5
6
7
8
9
10
11
from scipy import stats

# Example dataset: transaction amounts
transactions = np.array([50, 52, 49, 51, 48, 1000, 47, 53, 50, 49])

# Z-score method
z_scores = np.abs(stats.zscore(transactions))
anomalies = np.where(z_scores > 2)

print("Transaction amounts:", transactions)
print("Anomalies:", transactions[anomalies])

Explanation with impact

  • stats.zscore → computes how many standard deviations each value is from the mean.
  • > 2 → marks values more than 2 standard deviations away as anomalies.
  • The 1000 transaction stands out as abnormal.

Example: A payments company I advised used anomaly detection to catch fraudulent transactions. Even a basic Z-score approach flagged unusual spending spikes before more advanced ML models were deployed.

From this module:

  • Clustering helps find groups in unlabeled data (e.g., customer segments, patient subgroups).
  • PCA reduces dimensions for easier visualization and faster training.
  • Anomaly Detection identifies rare and critical outliers (fraud, defects, intrusions).

When I first learned unsupervised learning, I expected messy results since we don’t “teach” the algorithm with labels. But time and again, I’ve seen businesses uncover hidden patterns — new customer groups, unusual fraud rings, even hidden genetic patterns in biology — that no human could have labeled manually.

Frequently Asked Questions

Unsupervised learning is a machine learning approach where the model works with unlabeled data to discover hidden patterns, clusters, or anomalies.

K-Means groups data by repeatedly assigning points to the nearest cluster center and updating the centers until they stabilize.

Hierarchical clustering builds a tree-like structure by progressively merging the closest clusters. The dendrogram shows how groups form at different levels.

Principal Component Analysis (PCA) reduces high-dimensional data into fewer dimensions by keeping the directions of maximum variance, making data easier to analyze and visualize.

Anomaly detection identifies unusual patterns or outliers in data. It is commonly used in fraud detection, cybersecurity, medical diagnosis, and quality control.

Still have questions?Contact our support team