Capstone Project: Fraud Detection System
Fraud detection is one of the most important applications of machine learning today. Banks, e-commerce companies, and payment providers rely on ML models to spot suspicious transactions in real time.
Why is this problem challenging?
- Fraud is rare (often <1% of transactions).
- Fraudsters adapt quickly, so data patterns drift.
- False positives are costly (blocking good customers hurts business).
I once worked with a fintech startup that had 0.5% fraud in their dataset. Their first model showed 99% accuracy. Sounds great? Not really. The model simply predicted “not fraud” every time. The team had to learn that in fraud detection, accuracy is meaningless — metrics like precision, recall, and ROC AUC are what matter.
In this capstone, you’ll build your own Fraud Detection System and deploy it as an API that can score incoming transactions.
Step 1: Problem Framing
Given a financial transaction, predict whether it is fraudulent (1) or legitimate (0).
Input features could include:
- Transaction amount.
- User behavior features (time since last login, number of transactions in last hour).
- Merchant info (merchant ID, country, category).
- Device info (device age, browser, IP address).
Output:
- Probability of fraud (between 0 and 1).
- Label: Fraud = 1, Not Fraud = 0.
Step 2: Data Preparation
For simplicity, we’ll simulate a dataset. In real life, you would clean raw transaction logs, join with user metadata, and create features.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
import pandas as pd import numpy as np # Create synthetic dataset np.random.seed(42) n_samples = 5000 data = { "amount": np.random.exponential(100, n_samples), # transaction amounts "device_age_hours": np.random.randint(1, 1000, n_samples), "hour_of_day": np.random.randint(0, 24, n_samples), "country": np.random.choice(["US", "UK", "IN", "NG"], n_samples), "is_fraud": np.random.choice([0, 1], n_samples, p=[0.97, 0.03]) # 3% fraud } df = pd.DataFrame(data) print(df.head())
Key lesson: Fraud data is imbalanced. Only a small fraction of rows are fraud. You must handle this carefully.
Step 3: Train-Test Split and Preprocessing
We’ll split the data and preprocess categorical and numerical features.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.ensemble import GradientBoostingClassifier X = df.drop(columns=["is_fraud"]) y = df["is_fraud"] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=42 ) # Columns num_cols = ["amount", "device_age_hours", "hour_of_day"] cat_cols = ["country"] # Preprocessing: pass through numeric, encode categorical from sklearn.preprocessing import StandardScaler preprocessor = ColumnTransformer([ ("num", StandardScaler(), num_cols), ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols) ])
Why this matters:
- Scaling keeps numeric values balanced.
- One-hot encoding handles categorical features like country.
Step 4: Train a Baseline Model
We’ll use Gradient Boosting, a strong baseline for fraud detection on tabular data.
1 2 3 4 5 6 7 8
clf = GradientBoostingClassifier(random_state=42) pipe = Pipeline([ ("pre", preprocessor), ("clf", clf) ]) pipe.fit(X_train, y_train)
Why Gradient Boosting?
Tree ensembles like Gradient Boosting, XGBoost, or LightGBM are excellent for tabular data with mixed numeric/categorical features.
Step 5: Evaluate the Model
Fraud detection = don’t trust accuracy! Use ROC AUC, Precision, Recall, F1.
1 2 3 4 5 6 7 8
from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix y_pred = pipe.predict(X_test) y_probs = pipe.predict_proba(X_test)[:,1] print("ROC AUC:", roc_auc_score(y_test, y_probs)) print("\nClassification Report:\n", classification_report(y_test, y_pred)) print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
What these mean:
- ROC AUC → how well the model separates fraud vs non-fraud.
- Precision → of transactions flagged fraud, how many were correct.
- Recall → of all frauds, how many did we catch.
- F1 → balance between precision and recall.
💡 Personal insight:
In fraud detection, I usually prioritize recall (catch more fraud) but still balance with precision (to avoid too many false alarms). Business leaders often choose thresholds based on cost-benefit analysis.
Step 6: Threshold Tuning
Default classification uses 0.5 probability threshold. For fraud, you often lower this to catch more fraud.
1 2 3 4 5 6 7 8 9
from sklearn.metrics import precision_recall_curve prec, rec, thr = precision_recall_curve(y_test, y_probs) # Example: choose threshold where recall >= 0.9 target_idx = np.argmax(rec >= 0.9) chosen_threshold = thr[target_idx] print("Chosen threshold:", chosen_threshold)
You’ll adjust this based on business needs.
Step 7: Deploy the Model as an API
Finally, make the model available as a web service with FastAPI.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
import joblib from fastapi import FastAPI import pandas as pd # Save model joblib.dump(pipe, "fraud_model.pkl") # Load model model = joblib.load("fraud_model.pkl") app = FastAPI() @app.post("/score") def score(payload: dict): X = pd.DataFrame([payload]) prob = float(model.predict_proba(X)[0,1]) label = int(prob >= chosen_threshold) return {"probability": prob, "label": label}
Run:
1
uvicorn app:app --reload
Now you can send JSON input:
1 2 3 4 5 6
{ "amount": 250.75, "device_age_hours": 12, "hour_of_day": 23, "country": "US" }
Output:
1 2 3 4
{ "probability": 0.87, "label": 1 }
Lessons Learned
From this project, you now know how to:
- Frame a real-world fraud detection problem.
- Handle imbalanced data with the right metrics.
- Build a fraud classifier using Gradient Boosting.
- Tune thresholds to balance recall vs precision.
- Deploy your model as an API for production.
Final thought:
Fraud detection projects teach one of the hardest ML lessons: the goal isn’t just accuracy, it’s aligning model behavior with business impact. That’s the mark of a real ML engineer.
Frequently Asked Questions
The goal is to build and deploy a fraud detection system that predicts whether a transaction is fraudulent, using real-world ML best practices.
Fraud detection is critical in banking, e-commerce, and fintech. It protects businesses and customers from losses and requires handling highly imbalanced data.
Tree-based models like Gradient Boosting, XGBoost, or LightGBM perform well on tabular fraud datasets. Logistic Regression is often used as a baseline.
Instead of accuracy, focus on ROC AUC, precision, recall, and F1-score, since fraud is rare and imbalanced. Threshold tuning is also essential.
The model can be deployed using Flask or FastAPI as an API endpoint that accepts JSON transaction data and returns a fraud probability and label.
Still have questions?Contact our support team