Loading ad...

Capstone Project: Fraud Detection System

Fraud detection is one of the most important applications of machine learning today. Banks, e-commerce companies, and payment providers rely on ML models to spot suspicious transactions in real time.

Why is this problem challenging?

  • Fraud is rare (often <1% of transactions).
  • Fraudsters adapt quickly, so data patterns drift.
  • False positives are costly (blocking good customers hurts business).
I once worked with a fintech startup that had 0.5% fraud in their dataset. Their first model showed 99% accuracy. Sounds great? Not really. The model simply predicted “not fraud” every time. The team had to learn that in fraud detection, accuracy is meaningless — metrics like precision, recall, and ROC AUC are what matter.

In this capstone, you’ll build your own Fraud Detection System and deploy it as an API that can score incoming transactions.

Step 1: Problem Framing

Given a financial transaction, predict whether it is fraudulent (1) or legitimate (0).

Input features could include:

  • Transaction amount.
  • User behavior features (time since last login, number of transactions in last hour).
  • Merchant info (merchant ID, country, category).
  • Device info (device age, browser, IP address).

Output:

  • Probability of fraud (between 0 and 1).
  • Label: Fraud = 1, Not Fraud = 0.

Step 2: Data Preparation

For simplicity, we’ll simulate a dataset. In real life, you would clean raw transaction logs, join with user metadata, and create features.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import pandas as pd
import numpy as np

# Create synthetic dataset
np.random.seed(42)
n_samples = 5000

data = {
    "amount": np.random.exponential(100, n_samples),   # transaction amounts
    "device_age_hours": np.random.randint(1, 1000, n_samples),
    "hour_of_day": np.random.randint(0, 24, n_samples),
    "country": np.random.choice(["US", "UK", "IN", "NG"], n_samples),
    "is_fraud": np.random.choice([0, 1], n_samples, p=[0.97, 0.03])  # 3% fraud
}

df = pd.DataFrame(data)
print(df.head())

Key lesson: Fraud data is imbalanced. Only a small fraction of rows are fraud. You must handle this carefully.

Step 3: Train-Test Split and Preprocessing

We’ll split the data and preprocess categorical and numerical features.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingClassifier

X = df.drop(columns=["is_fraud"])
y = df["is_fraud"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Columns
num_cols = ["amount", "device_age_hours", "hour_of_day"]
cat_cols = ["country"]

# Preprocessing: pass through numeric, encode categorical
from sklearn.preprocessing import StandardScaler

preprocessor = ColumnTransformer([
    ("num", StandardScaler(), num_cols),
    ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols)
])

Why this matters:

  • Scaling keeps numeric values balanced.
  • One-hot encoding handles categorical features like country.

Step 4: Train a Baseline Model

We’ll use Gradient Boosting, a strong baseline for fraud detection on tabular data.

python
1
2
3
4
5
6
7
8
clf = GradientBoostingClassifier(random_state=42)

pipe = Pipeline([
    ("pre", preprocessor),
    ("clf", clf)
])

pipe.fit(X_train, y_train)

Why Gradient Boosting?
Tree ensembles like Gradient Boosting, XGBoost, or LightGBM are excellent for tabular data with mixed numeric/categorical features.

Step 5: Evaluate the Model

Fraud detection = don’t trust accuracy! Use ROC AUC, Precision, Recall, F1.

python
1
2
3
4
5
6
7
8
from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix

y_pred = pipe.predict(X_test)
y_probs = pipe.predict_proba(X_test)[:,1]

print("ROC AUC:", roc_auc_score(y_test, y_probs))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

What these mean:

  • ROC AUC → how well the model separates fraud vs non-fraud.
  • Precision → of transactions flagged fraud, how many were correct.
  • Recall → of all frauds, how many did we catch.
  • F1 → balance between precision and recall.

💡 Personal insight:
In fraud detection, I usually prioritize recall (catch more fraud) but still balance with precision (to avoid too many false alarms). Business leaders often choose thresholds based on cost-benefit analysis.

Step 6: Threshold Tuning

Default classification uses 0.5 probability threshold. For fraud, you often lower this to catch more fraud.

python
1
2
3
4
5
6
7
8
9
from sklearn.metrics import precision_recall_curve

prec, rec, thr = precision_recall_curve(y_test, y_probs)

# Example: choose threshold where recall >= 0.9
target_idx = np.argmax(rec >= 0.9)
chosen_threshold = thr[target_idx]

print("Chosen threshold:", chosen_threshold)

You’ll adjust this based on business needs.

Step 7: Deploy the Model as an API

Finally, make the model available as a web service with FastAPI.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import joblib
from fastapi import FastAPI
import pandas as pd

# Save model
joblib.dump(pipe, "fraud_model.pkl")

# Load model
model = joblib.load("fraud_model.pkl")

app = FastAPI()

@app.post("/score")
def score(payload: dict):
    X = pd.DataFrame([payload])
    prob = float(model.predict_proba(X)[0,1])
    label = int(prob >= chosen_threshold)
    return {"probability": prob, "label": label}

Run:

1
uvicorn app:app --reload

Now you can send JSON input:

json
1
2
3
4
5
6
{
  "amount": 250.75,
  "device_age_hours": 12,
  "hour_of_day": 23,
  "country": "US"
}

Output:

json
1
2
3
4
{
  "probability": 0.87,
  "label": 1
}

Lessons Learned

From this project, you now know how to:

  • Frame a real-world fraud detection problem.
  • Handle imbalanced data with the right metrics.
  • Build a fraud classifier using Gradient Boosting.
  • Tune thresholds to balance recall vs precision.
  • Deploy your model as an API for production.

Final thought:
Fraud detection projects teach one of the hardest ML lessons: the goal isn’t just accuracy, it’s aligning model behavior with business impact. That’s the mark of a real ML engineer.

Frequently Asked Questions

The goal is to build and deploy a fraud detection system that predicts whether a transaction is fraudulent, using real-world ML best practices.

Fraud detection is critical in banking, e-commerce, and fintech. It protects businesses and customers from losses and requires handling highly imbalanced data.

Tree-based models like Gradient Boosting, XGBoost, or LightGBM perform well on tabular fraud datasets. Logistic Regression is often used as a baseline.

Instead of accuracy, focus on ROC AUC, precision, recall, and F1-score, since fraud is rare and imbalanced. Threshold tuning is also essential.

The model can be deployed using Flask or FastAPI as an API endpoint that accepts JSON transaction data and returns a fraud probability and label.

Still have questions?Contact our support team