Supervised Learning: Regression
When I built my very first machine learning model, it was a humble linear regression that predicted the prices of used laptops based on their specifications. Seeing the predictions match reality felt like a lightbulb moment.
In this module, we’ll walk through supervised learning using linear regression to predict daily coffee sales at our neighborhood café. I’ll share the exact thought process I use in real projects, point out common mistakes, and explain each concept in plain language so there’s no room for confusion.
By the end, you’ll know how to:
- Understand and apply supervised learning for numeric predictions.
- Train a linear regression model in Python.
- Interpret model results to make real-world decisions.
- Evaluate model performance using reliable metrics.
Supervised Learning Explained
Supervised learning means teaching a model with the help of labeled examples. Think of it like a barista trainee who is shown every ingredient and the final drink result over and over until they can prepare it on their own.
In our café case:
- Features (Inputs): Temperature, foot traffic, day of the week, promotions.
- Label (Output): Number of coffees sold that day.
Why it matters:
- It’s the foundation of most predictive models.
- It works well when you have historical data with correct answers.
- It lets you move from gut feeling to data-driven planning.
💡 Personal insight: In a retail project, I once saw management invest heavily in “weekend promotions” because they assumed weekends drove sales. The model revealed rainy weekdays were their real moneymaker. Without supervised learning, they might have wasted thousands.
Why We’re Using Regression
Regression is about predicting continuous numbers. Since our target is “cups sold” (a number, not a category), regression is the right fit.
Benefits for our café:
- Predict exactly how many cups we’ll sell.
- Adjust staffing and inventory with confidence.
- Identify which factors push sales up or down.
Insight: I’ve used regression in multiple industries from forecasting energy demand to estimating construction costs. In every case, the ability to put a number to the prediction made planning far more precise.
How Linear Regression Works
Linear regression tries to find the best-fitting straight line through your data:

In a beverage company analysis, we discovered temperature’s coefficient was five times larger than discounts. They shifted budget from discounts to seasonal marketing, which boosted profits noticeably.
The Cost Function and Gradient Descent
Cost Function (MSE)
- Measures how wrong the model is.
- Penalizes big mistakes more than small ones.
Gradient Descent
- Adjusts weights step-by-step to reduce the cost.
- Think of it like walking downhill blindfolded, taking the smallest safe steps until you can’t go any lower.
Common mistake I’ve seen: Setting the “learning rate” too high makes the model jump around without improving. A slower learning rate may take longer, but it leads to stability.
Step 1: Preparing the Data
In any machine learning project, data preparation is like cleaning your kitchen before cooking. If your kitchen is messy, even the best recipe will turn out poorly. The same goes for ML, if your data isn’t prepared correctly, the model’s predictions will be unreliable.
Here’s the code we’re working with:
1 2 3 4 5 6 7 8 9 10 11 12
from sklearn.model_selection import train_test_split # Target variable (the thing we want to predict) y = df_encoded["coffees_sold"] # Features (the information we use to make the prediction) X = df_encoded.drop(columns=["coffees_sold", "date"]) # Split into training and testing sets (70% train, 30% test) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 )
Breaking it down:
- Target variable y: This is what we’re trying to predict in our café example, the number of coffees sold per day.
- Features X: These are the factors we think influence coffee sales temperature, number of people walking by, day of the week, promotions, etc. We drop "coffees_sold" (the target) and "date" (not useful as raw data for this prediction).
- Training and testing split:
- Training set (70%): Where the model learns patterns.
- Testing set (30%): Used to check if the model works on unseen data.
Why this step matters:
- It keeps our evaluation honest. The model can’t “cheat” by seeing the answers ahead of time.
- It prevents data leakage, which happens when information from the test set accidentally influences the training set, giving unrealistically good results.
Experience sharing: In a retail sales forecasting project, a junior analyst once trained the model on all the data, then tested it on the same data. The accuracy was 99%, and everyone celebrated, until the first real sales week hit and the predictions were completely wrong. That’s when I learned the importance of proper train-test splits.
Step 2: Training the Model
Now that our data is prepared, we can teach our model to find patterns. This is like teaching a barista to recognize how weather and promotions affect coffee demand.
The code:
1 2 3 4 5 6 7 8 9
from sklearn.linear_model import LinearRegression # Create and train the model model = LinearRegression() model.fit(X_train, y_train) # View learned parameters print("Feature Weights:", model.coef_) print("Intercept:", model.intercept_)
Breaking it down:
- LinearRegression() creates a model that tries to find the best-fitting straight line through our data.
- .fit(X_train, y_train) teaches the model how features relate to the target.
- Feature weights (coefficients): Show how much each factor changes coffee sales. For example, if “foot traffic” has a weight of 0.8, then every extra passerby increases sales by 0.8 cups.
- Intercept: The base number of coffees sold when all features are zero — your “starting point” for predictions.
Example: While building a hotel booking model, I found the intercept meant the hotel could expect 35% occupancy even without promotions or special events. This helped management set realistic “no-promotion” revenue expectations.
Step 3: Making Predictions
Once trained, our model can now make predictions on unseen data the test set.
1 2 3 4 5
y_pred = model.predict(X_test) # Compare predictions to actual results print("Predicted:", y_pred[:5]) print("Actual:", y_test.values[:5])
Breaking it down:
- model.predict(X_test) gives predicted sales for each day in our test set.
- We compare the predicted values to the actual sales to see how close we got.
Why it’s important:
- Predictions are just numbers unless you compare them to reality.
- A side-by-side view shows whether the model is generally correct or way off.
In our café case: If the model predicts 112 cups and actual sales were 120, we know we’re off by only 8 cups, this is good enough for ordering milk and coffee beans.
Lesson learned: In a logistics project, we found that even being off by 5% in shipment volume predictions led to trucks being underfilled, costing thousands in efficiency loss. That’s why precision matters.
Step 4: Evaluating the Model
Just like you wouldn’t trust a chef without tasting their food, you shouldn’t trust a model without checking its accuracy.
We’ll use three metrics: MSE, RMSE, and R² Score.
1 2 3 4 5 6 7 8 9 10
from sklearn.metrics import mean_squared_error, r2_score import numpy as np mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) r2 = r2_score(y_test, y_pred) print("MSE:", mse) print("RMSE:", rmse) print("R² Score:", r2)
What each metric means:
- MSE (Mean Squared Error): Measures the average squared difference between predicted and actual values. Big errors are punished more.
- RMSE (Root Mean Squared Error): Same as MSE but in the same units as sales, making it easier to understand.
- R² Score: Tells us how much of the variation in sales is explained by our model. A score of 1 means perfect prediction, 0 means no better than guessing.
Insight: In a supply chain model I built, our R² was 0.65. At first glance, it seemed low, but the predictions were good enough to plan warehouse stocking, which saved the company millions by avoiding overstocking.
From this café regression example, we’ve learned:
- Proper data splitting prevents false confidence.
- Feature weights reveal what really drives results.
- Predictions need to be checked against reality, not just taken at face value.
- Even simple models like linear regression can produce actionable insights.
Final thought from experience: I’ve worked on projects where teams insisted on using deep learning for a basic sales forecast, but a simple linear regression not only matched its accuracy but also explained why the predictions were made. Never underestimate simple, interpretable models.
Frequently Asked Questions
Regression is a supervised learning method that predicts continuous values, such as sales, prices, or temperatures, based on input features.
Linear regression is simple, interpretable, and often performs as well as complex models for many business problems while being faster to train.
MSE measures average squared prediction errors, RMSE is the square root of MSE, making results easier to interpret and R² explains how much variation in the target is explained by the model.
A good model has low error (MSE/RMSE) and a reasonably high R², but what’s “good” depends on the business context and acceptable error range.
No. Regression is designed for predicting numerical values. For categories (e.g., spam vs. not spam), you’d use classification models.
Still have questions?Contact our support team