Loading ad...

Hands-on Project: Supervised and Unsupervised Learning

Theory builds understanding, but projects build confidence. In this lesson we will complete two practical builds:

  • Customer Segmentation using K-Means to discover natural groups in customer behavior
  • Handwritten Digit Recognition with a simple neural network that learns from images

We will use Python, Scikit-learn, TensorFlow, Matplotlib, and Seaborn. I will explain every syntax choice so you always know why a line exists and what impact it has.

The first time I did these two projects end-to-end, I stopped feeling like I was “learning ML” and started feeling like I could solve problems. I want the same shift for you.

Project 1: Customer Segmentation using K-Means

Companies rarely have labels like “value shopper” or “premium buyer.” K-Means helps you discover segments directly from the data so marketing and product teams can tailor experiences. Typical wins include targeted promotions and smarter retention strategies.

Step 1: Import libraries and load a tiny dataset

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import pandas as pd              # tables and data handling
import numpy as np               # numerical arrays and quick math
import matplotlib.pyplot as plt  # plotting
import seaborn as sns            # higher level plots built on matplotlib

from sklearn.cluster import KMeans           # K-Means algorithm
from sklearn.preprocessing import StandardScaler  # feature scaling

# Example dataset: Mall customers
data = {
    "CustomerID": range(1, 11),
    "Annual Income (k$)": [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
    "Spending Score":      [39, 81,  6, 77, 40, 76,  6, 94, 40, 73]
}
df = pd.DataFrame(data)
print(df.head())

Syntax and impact

  • import pandas as pd and import numpy as np - standard abbreviations used across the Python ecosystem. This keeps code readable and consistent with most tutorials and docs.
  • matplotlib.pyplot as plt and seaborn as sns - Matplotlib is the plotting foundation. Seaborn adds nicer defaults. We will use both.
  • from sklearn.cluster import KMeans - imports the specific estimator we need.
  • from sklearn.preprocessing import StandardScaler - K-Means uses Euclidean distance. Features with large scales can dominate. Standardizing avoids bias.
  • pd.DataFrame(data) - converts a Python dict to a table you can inspect and slice.
  • print(df.head()) - quick sanity check that columns loaded correctly.

Tip for real work
Replace the synthetic dict with a CSV read:

python
1
df = pd.read_csv("customers.csv")

Handle missingness before clustering, for example with median imputation.

Step 2: Select features and scale them

python
1
2
3
4
X = df[["Annual Income (k$)", "Spending Score"]]  # choose features for clustering

scaler = StandardScaler()         # create the scaler object
X_scaled = scaler.fit_transform(X)  # fit on X then transform to standardized values

Syntax and impact

  • df[["col1", "col2"]] - column subset with a list keeps it a DataFrame. You could also use .values later to convert to a NumPy array.
  • StandardScaler() - centers each feature at mean 0 and scales to standard deviation 1. This makes distance calculations fair across features.
  • fit_transform - learns scaling parameters from the data, then applies them. For production pipelines, you would fit on training data and transform on new data to avoid leakage.

Why scaling matters
If income ranges 0 to 200 and spending score ranges 0 to 100, distance will be driven mostly by income. Customers that differ slightly in income can appear far apart even if their spending profiles are similar. Standardization fixes this imbalance.

Step 2.5: Choose k with the elbow and silhouette scores

Small datasets do not show perfect elbows, but here is the pattern:

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from sklearn.metrics import silhouette_score

inertias = []
sil_scores = []
K = range(2, 6)

for k in K:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    labels = km.fit_predict(X_scaled)
    inertias.append(km.inertia_)  # within-cluster sum of squares
    sil_scores.append(silhouette_score(X_scaled, labels))

print("Inertias:", inertias)
print("Silhouette:", sil_scores)
  • km.inertia_ drops as k increases. Look for a bend point.
  • silhouette_score ranges from −1 to 1. Higher is better. It measures how separated and compact clusters are.

For this demo we will proceed with 2 clusters.

Step 3: Run K-Means and attach labels

python
1
2
3
kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
df["Cluster"] = kmeans.fit_predict(X_scaled)
print(df)

Syntax and impact

  • n_clusters=2 - number of groups to find. This is the key hyperparameter in K-Means.
  • random_state=42 - sets the seed for centroid initialization. This makes your results reproducible for teaching and debugging.
  • n_init=10 - number of random centroid initializations. K-Means can get stuck in poor local minima. Multiple starts pick the best of several tries.
  • fit_predict - runs K-Means on the data and returns cluster labels in one step. You could also call fit then predict separately.
  • df["Cluster"] = ... - stores the cluster assignment back in your table. Business partners like to see IDs with segment labels.

Peek at centroids and distances

python
1
2
print("Centroids (scaled space):\n", kmeans.cluster_centers_)
print("Within-cluster SSE (inertia):", kmeans.inertia_)
  • cluster_centers_ - means of each cluster in standardized space.
  • inertia_ - smaller is tighter. Only compare inertia across models with the same data and scaling.

Step 4: Visualize clusters

python
1
2
3
4
5
6
7
8
9
10
plt.figure(figsize=(6, 4))
sns.scatterplot(
    x="Annual Income (k$)", y="Spending Score",
    hue="Cluster", data=df, palette="viridis", s=100
)
plt.title("Customer Segmentation with K-Means")
plt.xlabel("Annual Income (k$)")
plt.ylabel("Spending Score")
plt.tight_layout()
plt.show()

Syntax and impact

  • plt.figure(figsize=(6, 4)) - makes the plot readable on most screens.
  • sns.scatterplot(..., hue="Cluster") - colors by cluster label so groups are visible at a glance.
  • palette="viridis" - color map with good contrast.
  • s=100 - point size for clarity.
  • tight_layout() - avoids clipped labels.
  • show() - renders the figure.
At a retail client, a similar plot revealed a small cluster of loyal frequent buyers with high spending scores but moderate income. Targeted loyalty benefits for that group lifted revenue 20 percent in a quarter.

Project 2: Handwritten Digit Recognition with a Neural Network

Why this project

Digit recognition is a practical on-ramp to computer vision. The dataset is clean, the task is intuitive, and the payoff is satisfying. The same pipeline scales to scanned forms, invoice fields, and quality checks in manufacturing.

Step 1: Import libraries and load MNIST

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load dataset (downloads if not cached)
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("Training samples:", X_train.shape)  # (60000, 28, 28)
print("Test samples:", X_test.shape)      # (10000, 28, 28)

# Visualize one example to build intuition
plt.imshow(X_train[0], cmap="gray")
plt.title(f"Label: {y_train[0]}")
plt.axis("off")
plt.show()

Syntax and impact

  • mnist.load_data() - returns four arrays. Images are 28 by 28 grayscale. Labels are integers 0 to 9.
  • print(X_train.shape) - always confirm shapes before modeling. Shape mismatches are the most common beginner error.
  • plt.imshow(..., cmap="gray") - renders the image. Visual checks prevent silly mistakes like training on the wrong axis order.

Step 2: Normalize images

python
1
2
3
# Convert to float and scale to 0-1 for stable training
X_train = X_train.astype("float32") / 255.0
X_test  = X_test.astype("float32") / 255.0

Syntax and impact

  • astype("float32") - Keras prefers float32 tensors. This avoids implicit casting and speed issues.
  • / 255.0 - pixel values are 0 to 255. Neural nets train faster and more stably with small inputs.

Step 3: Build a simple neural network

python
1
2
3
4
5
6
7
8
9
10
11
12
13
model = Sequential([
    Flatten(input_shape=(28, 28)),   # 28x28 -> 784-length vector
    Dense(128, activation="relu"),   # hidden layer learns nonlinear patterns
    Dense(10, activation="softmax")  # 10 probabilities that sum to 1
])

model.compile(
    optimizer="adam",                       # adaptive gradient method
    loss="sparse_categorical_crossentropy", # integer labels 0..9
    metrics=["accuracy"]                    # report accuracy each epoch
)

model.summary()

Syntax and impact

  • Sequential([...]) - a straight stack of layers is perfect for this task.
  • Flatten(input_shape=(28, 28)) - converts each image to a 1D vector so a Dense layer can process it.
  • Dense(128, activation="relu") - 128 neurons is a good small default. ReLU helps gradients flow and learns nonlinear features.
  • Dense(10, activation="softmax") - one output per class. Softmax converts raw scores to probabilities.
  • optimizer="adam" - reasonable default for most beginners. It adapts learning rates per parameter.
  • loss="sparse_categorical_crossentropy" - correct loss when labels are integers rather than one-hot vectors.
  • model.summary() - prints parameter counts and output shapes. Use it whenever you modify the architecture.

Step 4: Train and monitor

python
1
2
3
4
5
6
7
history = model.fit(
    X_train, y_train,
    epochs=5,                 # 5 passes over the training set
    batch_size=128,           # updates per 128 images
    validation_split=0.1,     # hold out 10 percent for validation
    verbose=1
)

Syntax and impact

  • epochs=5 - small yet adequate to reach strong accuracy on MNIST.
  • batch_size=128 - balances GPU or CPU efficiency and generalization. You can try 64 or 256.
  • validation_split=0.1 - keeps a validation set for early warning of overfitting.
  • history - stores loss and accuracy per epoch. Plotting it helps diagnose training issues.

Optional visualization:

python
1
2
3
plt.plot(history.history["loss"], label="train loss")
plt.plot(history.history["val_loss"], label="val loss")
plt.legend(); plt.title("Training curves"); plt.show()

Step 5: Evaluate and inspect predictions

python
1
2
3
4
5
6
7
8
9
10
11
12
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print("Test accuracy:", round(test_acc, 4))

# Predict first 9 digits to sanity check outputs
probs = model.predict(X_test[:9])
preds = probs.argmax(axis=1)

fig, axes = plt.subplots(3, 3, figsize=(6, 6))
for ax, img, pred, true in zip(axes.ravel(), X_test[:9], preds, y_test[:9]):
    ax.imshow(img, cmap="gray"); ax.axis("off")
    ax.set_title(f"pred {pred} - true {true}")
plt.tight_layout(); plt.show()

Syntax and impact

  • model.evaluate - uses the final weights to score performance on unseen data.
  • model.predict - returns a probability vector per image. argmax converts it to a class id.
  • The grid of images with predicted and true labels is an immediate reality check.

Common extensions for real work

  • Add a Dropout(0.2) layer after the Dense(128) to reduce overfitting on harder image datasets.
  • Switch to a small CNN when you are comfortable. Convolutional layers exploit spatial structure and outperform Dense-only models on images.

Lessons Learned

  • K-Means discovers natural structure in customer data and gives you a segment per customer you can act on. Scaling features is crucial, and choosing k deserves care. Visualizations turn numbers into strategy.
  • A basic neural network can already reach strong accuracy on MNIST. Correct shapes, correct loss, and correct output activation are the three most common stumbling blocks. Fix those and your training will fly.
  • Matplotlib and Seaborn are not decoration. They are diagnostic tools that help you validate pipeline steps and communicate results clearly.
  • Preprocessing choices like scaling and normalization matter as much as the model. I have seen more failures from poor preprocessing than from model choice.

Frequently Asked Questions

You’ll learn how to apply K-Means clustering for customer segmentation and build a simple neural network in TensorFlow to recognize handwritten digits.

K-Means groups customers based on features like income and spending score, helping businesses discover hidden segments for targeted marketing.

The neural network flattens 28×28 pixel images, learns patterns through hidden layers, and outputs probabilities for digits 0–9 using softmax.

We use Scikit-learn for clustering, TensorFlow/Keras for neural networks, and Matplotlib/Seaborn for visualization.

No. The lesson is beginner-friendly, with step-by-step explanations of code, syntax, and reasoning so even newcomers can follow along.

Still have questions?Contact our support team