Fraud Detection Intelligence

Bank Transaction
Analytics Report

Exploratory Data Analysis ของชุดข้อมูลธุรกรรมธนาคาร 50,000 รายการ เพื่อค้นหารูปแบบและสัญญาณความผิดปกติ

Total Transactions

0

Avg $297.87 per transaction • $14.8M volume

Accounts

0

~101 transactions per account

Date Range

6 years

Jan 2020 — Dec 2025

2020 2025

Transaction Locations

0

cities across the United States

00 — Executive Summary

Pipeline at a Glance

End-to-end unsupervised fraud detection — from raw data to actionable risk scores

Transactions Analyzed

0

$14.8M total volume

ML Models

3

ISO Forest + DBSCAN + LOF

Expert Rules

7

94.7% recall on ML anomalies

Critical Alerts

24

0.05% highest risk

Hybrid Risk Distribution

Final Output

Top Risk Factors

Key Drivers

01 — Data Quality

Data Quality Overview

Dataset structure and completeness assessment

✅

0

Missing Values

✅

0

Duplicate Rows

📊

15

Columns

📅

2020–2025

Date Range

Transaction Basics

6 Fields

TransactionID	Unique Key
AccountID	495 unique
TransactionAmount	Numeric
TransactionDate	Datetime
TransactionType	2 types
AccountBalance	Numeric

Behavioral & Technical

7 Fields

TransactionDuration	Seconds
Channel	3 types
Location	43 cities
DeviceID	681 unique
IP Address	Text
MerchantID	100 unique
LoginAttempts	1–5

Customer Demographics

2 Fields

CustomerAge	18–80 yrs
CustomerOccupation	4 types

Data Quality Verdict

No missing values or duplicates detected. The dataset is clean and ready for ML pipeline processing.

02 — Transaction Overview

Transaction Patterns

Distribution and composition of banking transactions

Transaction Amount Distribution

Key Metric

Average

$297.87

Median

$209.36

Min

$0.24

Std Dev

$292.82

Max

$2,060

Insight

Mean > Median indicates right-skewed distribution. High-value transactions pull the average up. Std Dev of $292 shows significant dispersion.

Transaction Type Breakdown

Imbalanced

Debit 77.5%

Credit 22.5%

Debit : Credit Ratio

3.4 : 1

Insight

Debit transactions outnumber Credit by 3.4x. Fraud typically targets debit-side operations, making this imbalance critical for model design.

Channel Distribution

Balanced

Branch17,278 (34.6%)

ATM16,552 (33.1%)

Online16,170 (32.3%)

Insight

All three channels are evenly distributed (~33% each). Fraud monitoring must cover all channels equally.

Customer Occupation

Demographics

Student13,059 (26.1%)

Doctor12,578 (25.2%)

Engineer12,491 (25.0%)

Retired11,872 (23.7%)

03 — Behavioral Analysis

Behavioral Patterns

Time patterns, device usage, and variable relationships

Transactions by Day of Week

Time Pattern

Time Distribution Warning

Data Issue

95%

of transactions recorded at Hour = 0 (midnight)

47,488

Hour = 0

2,512

Other Hours

Warning

Most records default to 00:00 — likely date-only entries without timestamps. Hour-based features should be used with caution in modeling.

Age vs Balance Relationship

Demographics

Average Balance

$5,122

Max Balance

$14,978

Correlation Matrix

Relationships

	Amount	Balance	Duration	Login	Age
Amount	1.00	-0.02	0.01	-0.02	-0.02
Balance	-0.02	1.00	0.01	0.01	0.32
Duration	0.01	0.01	1.00	0.03	-0.02
Login	-0.02	0.01	0.03	1.00	0.01
Age	-0.02	0.32	-0.02	0.01	1.00

Key Finding

Age vs Balance (r=0.32) is the only meaningful correlation. Older customers tend to have higher balances. Low inter-feature correlation is beneficial for ML — each feature contributes unique information.

04 — Anomaly Detection

Fraud Risk Indicators

Signals that may indicate fraudulent transaction activity

⚠

High Amount Outliers

2,375 transactions (4.75%) exceed IQR upper bound of $899.72. Maximum amount reaches $2,060.

🔒

Suspicious Login Attempts

1,924 transactions (3.85%) required 3–5 login attempts. Frequency increases with attempts — potential automated attack pattern.

📱

Device Sharing — Critical

609 devices are shared across multiple accounts. One device serves up to 9 distinct accounts — strong indicator of account takeover.

🌎

Multi-Location Activity

266 accounts operate across 5+ cities. 428 accounts use 3+ devices — potential compromised account indicators.

Login Attempts Distribution

Security

Anomalous Pattern

95.1% succeed on first attempt. Unusually, transactions increase from 3 to 5 attempts (619 → 645 → 660) instead of decreasing — suggests automated brute-force behavior.

Risk Summary Dashboard

Risk Matrix

Device Sharing 89.4% of devices

Timestamp Quality 95% missing time

Multi-Location Accounts 53.7% of accounts

Amount Outliers 4.75%

High Login Attempts (3+) 3.85%

Overall Assessment

Multiple fraud signals detected across dimensions. Device sharing is the most severe risk (89.4%). A composite risk score combining all signals is recommended.

05 — Model Evaluation

Anomaly Detection Results

3 Models evaluated — Isolation Forest, DBSCAN, LOF — with composite risk scoring

Consensus Anomalies

190

flagged by 2+ models (0.4%)

All 3 Agree

18

highest confidence anomalies

High + Critical

129

risk score > 0.5

Accounts at Risk

15

of 495 accounts have High+ txns

Internal Validation Metrics

Unsupervised

Model	Anomalies	Silhouette	CH Index	DBI
Isolation Forest	2,500 (5.0%)	0.3923	2,208	2.70
DBSCAN	24 (0.05%)	0.5900	114	1.23
LOF	2,500 (5.0%)	0.0113	29	17.06

Best Model

Isolation Forest ได้ Silhouette + Calinski-Harabasz สูงสุด → แยก Normal/Anomaly ได้ชัดเจนที่สุด DBSCAN จับได้น้อยแต่แม่นยำ (DBI ต่ำสุด)

Model Agreement

Consensus

0 models (Normal) 45,184 (90.4%)

1 model only 4,626 (9.3%)

2 models agree 172 (0.3%)

All 3 models agree 18 (0.04%)

Overlap

Jaccard similarity ต่ำ (ISO∩LOF = 0.038) → แต่ละ model จับ anomaly คนละแบบ การรวม ensemble จึงมีประสิทธิภาพ

Key Risk Drivers

Impact

Factor	Risk Premium	Detail
Login Attempts 3+	+164.3%	Strongest single risk indicator
ATM Channel	Highest risk	Avg risk score 0.0851
Student Occupation	Highest risk	Avg risk score 0.0893
Amount-to-Balance	+527%	Anomaly ratio 6x higher than normal

Anomaly vs Normal Profile

Statistical (p<0.001)

Transaction Amount +99.7% higher

Daily Txn Count +79.0% higher

Login Attempts +67.5% higher

Amount Z-Score +3,704% higher

Unique Devices +13.8% higher

All Significant

ทุก feature มีความแตกต่างอย่างมีนัยสำคัญทางสถิติ (Mann-Whitney U test, p<0.001)

06 — Rule Engine & Hybrid Scoring

Expert Rules + ML Fusion

7 expert fraud rules combined with ML scores — Hybrid Risk Score captures 94.7% of ML-detected anomalies

Rule Recall

94.7%

rules catch ML-detected anomalies

Critical Risk

24

0.05% — highest risk transactions

High Risk

3,028

6.1% of all transactions

Rules Triggered

61.9%

transactions trigger ≥ 1 rule

7 Expert Fraud Rules

Rule Engine

Rule	Trigger Rate	Severity
Login Attempts ≥ 3	3.85%	Critical
Device Shared ≥ 5 Accounts	45.92%	Critical
Amount Z-Score > 2	3.39%	High
Amount / Balance > 0.8	5.62%	High
Multi-Location (8+ Cities)	18.79%	High
Rapid Transaction (< 1hr)	2.41%	Medium
High Velocity (3+ txn/day)	0.12%	Medium

Top Trigger

Device Sharing triggers มากที่สุด (45.92%) → อุปกรณ์เครื่องเดียวถูกใช้หลายบัญชี เป็นสัญญาณ fraud ที่ชัดเจน

Hybrid Risk Distribution

ML + Rules

Hybrid Score = 0.5 × ML Score + 0.5 × Rule Score

Low 44.7%

Medium 49.2%

High 6.1%

Critical 0.05%

Hybrid Approach

Hybrid Score รวม ML กับ Rules → 94.7% recall สำหรับ consensus anomalies พร้อม interpretability จาก rule-based reasoning

Full Pipeline Architecture

End-to-End

Raw Data
50K Transactions

→

EDA
Data Quality

→

Features
~30 Derived

→

ML Models
ISO + DBSCAN + LOF

→

Rule Engine
7 Expert Rules

→

Hybrid Score
ML + Rules

07 — Next Steps

Roadmap & Recommendations

Completed phases and strategic action plan for the fraud detection pipeline

✓

Feature Engineering

~30 derived features: z-scores, velocity, device sharing, amount ratios, rapid txn flags

✓

Anomaly Detection

3 models deployed: Isolation Forest, DBSCAN, LOF with composite risk scoring

✓

Model Evaluation

Internal metrics, sensitivity analysis, feature importance, statistical significance tests

✓

Rule Engine + Dashboard

7 expert rules, Hybrid Score (ML+Rules), Streamlit dashboard, executive report

Business Recommendations

Strategic

Pain Point	Evidence	Recommended Action	Priority
Device Sharing	609 devices shared, max 9 accounts	Implement device fingerprinting & alerts	Critical
Brute Force Login	3.85% with 3+ attempts, increasing pattern	Rate limiting + CAPTCHA after 2 failures	Critical
Amount Anomalies	4.75% outlier transactions above $899	ML-based dynamic thresholds per account	High
Geo Anomalies	266 accounts active in 5+ cities	Location velocity check (impossible travel)	High
Data Quality	95% of timestamps default to 00:00	Improve data pipeline for time field capture	Medium

Bank TransactionAnalytics Report

Feature Engineering

Anomaly Detection

Model Evaluation

Rule Engine + Dashboard

Bank Transaction
Analytics Report