Loading ad...

Recommender Systems in ML

Every time Netflix suggests a movie, Amazon recommends a product, or Spotify curates your playlist, you’re experiencing a recommender system in action.

Recommender systems have become a cornerstone of modern businesses because they increase engagement, sales, and satisfaction. In fact, Netflix once reported that over 80% of hours watched came from recommendations.

My story:
When I first worked on an e-learning platform, students complained about being overwhelmed with too many courses. After introducing a recommender system that suggested courses based on their previous learning history, completion rates jumped by 30%. It wasn’t about “more options”, it was about showing the right options.

In this lesson, we’ll cover three main approaches:

  1. Collaborative filtering: learns from user behavior.
  2. Content-based recommendation: learns from item features.
  3. Hybrid systems: combine the best of both worlds.

Section 1: Collaborative Filtering

Collaborative filtering is based on the idea that users with similar tastes will like similar things.

Example: If Alice and Bob both liked three of the same books, and Alice also liked a fourth book, then Bob will probably like it too.

There are two main types:

  • User-based CF: Find similar users.
  • Item-based CF: Find similar items.

Code Example: User-based Collaborative Filtering

We’ll use the Surprise library, designed for building recommender systems.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import pandas as pd
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import train_test_split
from surprise import accuracy

# Sample ratings dataset: user, item, rating
ratings_dict = {
    "userID": ["A", "A", "B", "B", "C", "C", "D"],
    "itemID": ["Book1", "Book2", "Book2", "Book3", "Book1", "Book3", "Book2"],
    "rating": [5, 3, 4, 2, 4, 5, 3]
}
df = pd.DataFrame(ratings_dict)

# Surprise requires Reader format
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[["userID", "itemID", "rating"]], reader)

# Train-test split
trainset, testset = train_test_split(data, test_size=0.3)

# User-based collaborative filtering
sim_options = {"name": "cosine", "user_based": True}
algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)

# Predict ratings for test set
predictions = algo.test(testset)
print("RMSE:", accuracy.rmse(predictions))

Explanation with impact

  • ratings_dict → tiny dataset of user ratings. In real systems, this could be millions of rows.
  • Reader(rating_scale=(1, 5)) → tells Surprise the scale of ratings.
  • KNNBasic with user_based=True → finds similar users using cosine similarity.
  • accuracy.rmse → evaluates prediction error.

Example: Amazon’s early recommenders used item-based collaborative filtering at scale. It was simple yet extremely effective, powering “Customers who bought this also bought.”

Limitations

  • Needs lots of ratings (suffers with new users or new items, the cold start problem).
  • Doesn’t use item information (e.g., genre, category).

Section 2: Content-Based Recommendation

Content-based recommenders use item features (keywords, categories, descriptions) to suggest items similar to what the user already liked.

Example: If you liked a movie tagged “sci-fi” and “space,” you’ll get more space sci-fi movies.

Code Example: TF-IDF Movie Recommendation

We’ll build a simple movie recommender using plot descriptions.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import pandas as pd

# Example movie dataset
movies = pd.DataFrame({
    "title": ["Interstellar", "The Martian", "Gravity", "Inception"],
    "description": [
        "Space travel and wormholes to save humanity",
        "Astronaut stranded on Mars, survival story",
        "Astronauts trapped in orbit after disaster",
        "Dream within a dream, sci-fi thriller"
    ]
})

# Convert descriptions to TF-IDF vectors
tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(movies["description"])

# Compute cosine similarity between movies
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Recommend movies similar to "Interstellar"
idx = movies[movies["title"] == "Interstellar"].index[0]
sim_scores = list(enumerate(cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
recommendations = [movies["title"][i[0]] for i in sim_scores[1:3]]

print("Because you watched Interstellar, you may like:", recommendations)

Explanation with impact

  • TfidfVectorizer → turns text into vectors where important words (like “space” or “dream”) have higher weights.
  • linear_kernel → calculates similarity between every pair of movies.
  • Picking the top scores gives most similar movies.

My story: I once built a course recommender using course descriptions. A student who finished “Introduction to Python” would get “Data Analysis with Python” next, because they shared keywords like “Python,” “data,” and “analysis.”

Limitations

  • Recommends items too similar (little discovery).
  • Relies heavily on good metadata. If descriptions are poor, results suffer.

Section 3: Hybrid Systems

Hybrid systems combine collaborative filtering and content-based methods.

  • Collaborative filtering captures crowd wisdom.
  • Content-based ensures recommendations for new items.
  • Together, they fix each other’s weaknesses.

Netflix famously uses hybrid recommenders, blending user viewing behavior with movie features like genre, actors, and directors.

Simple Hybrid Example

python
1
2
3
4
5
6
7
8
9
# Suppose CF predicts Bob will like "Inception" with score 4.2
cf_score = 4.2  

# Content-based gives "Inception" similarity score 0.85
cb_score = 0.85  

# Weighted hybrid: 70% CF, 30% CB
final_score = 0.7 * cf_score + 0.3 * cb_score
print("Final recommendation score:", final_score)

Explanation with impact

  • cf_score comes from collaborative filtering predictions.
  • cb_score comes from content similarity.
  • Weighted averaging gives a balanced score.

💡 Industry insight: Spotify blends collaborative signals (people who listen to similar tracks) with content signals (audio features like tempo, key, and rhythm). That’s why Spotify mixes personalization and discovery so well.

Lessons Learned

From this module:

  • Collaborative filtering learns from user-item interactions, powerful but suffers from cold start.
  • Content-based relies on item features, great for personalization but can be narrow.
  • Hybrid systems combine strengths, used by companies like Netflix, Amazon, and Spotify.

Final thought:
When I build recommenders, I start small, maybe item-based CF to prove value. Then I layer in content data. Finally, I build hybrids when scale demands it. The magic isn’t the algorithm alone, but how it aligns with business goals and user trust.

Frequently Asked Questions

A recommender system suggests items (like movies, books, or products) to users based on behavior, preferences, or item features.

Collaborative filtering uses the behavior of similar users or items to make recommendations. For example, “Users like you also liked…”

Content-based methods use item features such as keywords, genres, or descriptions to recommend items similar to what a user already liked.

Hybrid systems combine collaborative filtering and content-based approaches to overcome weaknesses like cold start and narrow recommendations.

They power recommendations on Netflix, Amazon, Spotify, YouTube, LinkedIn, and most platforms that personalize user experiences.

Still have questions?Contact our support team