AI/ML Roadmap & Current Status
Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST
Current Implementation Statusβ
Phase 1: Rule-Driven Detection (COMPLETE)β
The MVP implementation uses a rule-based detection engine with hooks for ML integration. This was a deliberate design choice to:
- Deliver value faster - Rules can be tuned immediately without training data
- Ensure interpretability - Every decision has explainable reasons
- Establish infrastructure - Feature pipeline, evidence capture, and policy engine ready for ML
What Is Implementedβ
| Component | Status | Description |
|---|---|---|
| Feature Engine | Complete | Redis velocity counters with sliding windows |
| Detection Engine | Complete | 5 detector types (card testing, velocity, geo, bot, friendly) |
| Risk Scoring | Complete | Rule-based combination of detector signals |
| Policy Engine | Complete | YAML configuration with hot-reload |
| Evidence Vault | Complete | Immutable storage with feature snapshots |
| Metrics Pipeline | Complete | Prometheus metrics for all components |
| Load Testing | Complete | Validated 1000+ RPS at 106ms P99 |
Detection Logic (Current)β
# Simplified scoring formula (rule-based)
criminal_score = max(
card_testing.confidence * 0.9, # Card testing patterns
velocity.confidence * 0.8, # Velocity rule triggers
geo_anomaly.confidence * 0.7, # Geographic issues
bot_detection.confidence * 0.95 # Automation signals
)
friendly_score = friendly_fraud.confidence * 0.6
# Policy thresholds (configurable)
if criminal_score >= 0.85 or friendly_score >= 0.95:
return BLOCK
elif criminal_score >= 0.60 or friendly_score >= 0.70:
return FRICTION
elif criminal_score >= 0.40 or friendly_score >= 0.50:
return REVIEW
else:
return ALLOW
Phase 2: Hybrid ML + Rulesβ
ML Model Specificationβ
Criminal Fraud Modelβ
| Attribute | Specification |
|---|---|
| Algorithm | XGBoost (primary), LightGBM (challenger) |
| Objective | Binary classification (is_criminal_fraud) |
| Training Window | 90 days of transactions with 120-day label maturity |
| Retraining Frequency | Weekly (automated pipeline) |
| Feature Count | 25+ features |
| Target AUC | >0.85 |
| Latency Budget | <25ms P99 |
Feature Listβ
Velocity Features (Real-time from Redis):
| Feature | Description | Window |
|---|---|---|
| card_attempts_10m | Transaction attempts on card | 10 min |
| card_attempts_1h | Transaction attempts on card | 1 hour |
| card_attempts_24h | Transaction attempts on card | 24 hours |
| device_distinct_cards_1h | Unique cards on device | 1 hour |
| device_distinct_cards_24h | Unique cards on device | 24 hours |
| ip_distinct_cards_1h | Unique cards from IP | 1 hour |
| user_total_amount_24h | Total spend by user | 24 hours |
| card_decline_rate_1h | Decline rate for card | 1 hour |
Entity Features (From profiles):
| Feature | Description | Source |
|---|---|---|
| card_age_hours | Time since card first seen | Redis |
| device_age_hours | Time since device first seen | Redis |
| user_account_age_days | Account creation age | Profile |
| user_chargeback_count_lifetime | Historical chargebacks | Profile |
| user_chargeback_rate_90d | Recent chargeback rate | Profile |
| card_distinct_devices_30d | Devices using this card | Redis |
| card_distinct_users_30d | Users using this card | Redis |
Transaction Features (From event):
| Feature | Description | Computation |
|---|---|---|
| amount_usd | Transaction amount | Direct |
| amount_zscore | Amount vs user average | (amount - avg) / std |
| is_new_card_for_user | First time card used | Boolean |
| is_new_device_for_user | First time device used | Boolean |
| hour_of_day | Local time hour | Timezone adjusted |
| is_weekend | Weekend flag | Boolean |
Device/Network Features:
| Feature | Description | Source |
|---|---|---|
| is_emulator | Device is emulated | Fingerprint |
| is_rooted | Device is rooted/jailbroken | Fingerprint |
| is_datacenter_ip | IP from cloud provider | IP intelligence |
| is_vpn | VPN detected | IP intelligence |
| is_tor | Tor exit node | IP intelligence |
| ip_risk_score | Third-party IP score | External API |
Label Sourceβ
Positive Labels (Fraud):
- Confirmed fraud chargebacks (reason codes 10.1-10.5)
- TC40/SAFE issuer alerts with fraud type indicators
- Manual review disposition: "confirmed fraud"
Negative Labels (Legitimate):
- Authorizations without chargebacks or fraud alerts after 120-day aging
- Manual review disposition: "legitimate"
Exclusions (Not Used for Training):
- Partial refunds and service complaints
- Ambiguous disputes (friendly fraud candidates)
- Transactions with incomplete feature capture
Label Maturity: 120 days from transaction
- Reason: Chargebacks can arrive up to 120 days post-transaction
- Consequence: Training data is always 4 months behind
- Mitigation: Use issuer alerts (TC40/SAFE) for earlier signal
Retraining Strategyβ
Weekly Pipeline (Automated):
1. Extract transactions from T-120d to T-30d
2. Join with chargeback outcomes
3. Retrieve point-in-time features from evidence vault
4. Train new model version
5. Validate against holdout (last 7 days)
6. If AUC drop < 2%: Register as challenger
7. If AUC drop >= 2%: Alert DS team, use previous model
Monthly Review (Manual):
1. Compare champion vs challenger performance
2. Analyze feature importance drift
3. Review false positive cases
4. Decide: promote challenger or retrain with adjustments
Champion/Challenger Frameworkβ
Experiment Architectureβ
Traffic Routing:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Load Balancer β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ
βChampion β βChallengerβ β Holdout β
β (80%) β β (15%) β β (5%) β
β Model A β β Model B β βRules Onlyβ
βββββββββββ βββββββββββ βββββββββββ
Routing: Deterministic hash on auth_id (reproducible)
Experiment Metricsβ
| Metric | Champion | Challenger | Threshold |
|---|---|---|---|
| Approval Rate | 91.2% | Must be within 1% | -1% to +2% |
| Fraud Rate (30d lag) | 1.15% | Must improve | <1.15% |
| P99 Latency | 106ms | Must be within 20% | <127ms |
| False Positive Rate | 12% | Must improve | <12% |
Promotion Criteriaβ
Promote Challenger if ALL true:
1. Running for >= 14 days
2. Sample size >= 100,000 transactions
3. Fraud rate improved by >= 5% (statistically significant)
4. Approval rate within 1% of champion
5. No latency degradation
6. No anomalies in score distribution
Rollback Challenger if ANY true:
1. Fraud rate increased by > 10%
2. Approval rate dropped by > 3%
3. P99 latency exceeded 150ms
4. Error rate exceeded 0.5%
Replay Frameworkβ
Purposeβ
Historical replay enables:
- Threshold simulation - Test new thresholds on historical data
- Model validation - Compare model predictions to known outcomes
- Policy change estimation - Quantify impact before deployment
Implementationβ
async def replay(
start_date: datetime,
end_date: datetime,
policy_config: Optional[dict] = None,
model_version: Optional[str] = None
) -> ReplayResults:
"""
Replay historical transactions with optional config changes.
Key: Uses point-in-time features from evidence vault,
NOT current features (which would cause look-ahead bias).
"""
for transaction in get_historical_transactions(start_date, end_date):
# Get features AS THEY WERE at transaction time
features = get_features_at_time(
transaction.auth_id,
transaction.timestamp
)
# Score with specified model/policy
new_decision = score_and_decide(
transaction,
features,
model_version,
policy_config
)
# Compare to actual outcome
actual_fraud = was_transaction_fraud(transaction.auth_id)
# Record for analysis
results.append({
"original_decision": transaction.original_decision,
"new_decision": new_decision,
"actual_fraud": actual_fraud
})
return analyze_results(results)
Simulation Use Casesβ
| Use Case | Input | Output |
|---|---|---|
| Threshold change | New threshold values | Approval rate delta, fraud caught delta |
| Model comparison | Model version A vs B | AUC difference, FP rate difference |
| Rule addition | New rule definition | Transactions affected, score changes |
| Seasonal analysis | Date range comparison | Pattern differences by period |
Phase 3: Advanced ML (Future)β
Planned Enhancementsβ
| Enhancement | Timeline | Description |
|---|---|---|
| Graph Neural Network | Phase 3 | Detect fraud rings via card-device-user connections |
| Sequence Model | Phase 3 | LSTM/Transformer for transaction sequence patterns |
| Anomaly Detection | Phase 3 | Isolation Forest for unknown attack patterns |
| Real-time Retraining | Phase 3 | Online learning for rapid adaptation |
| External Signals | Phase 3 | TC40/SAFE, Ethoca, BIN intelligence, device reputation |
External Signal Integration (Phase 3)β
| Signal | Source | Use Case |
|---|---|---|
| TC40/SAFE | Issuer alerts | Early fraud signal before chargeback |
| Ethoca/Verifi | Network alerts | Pre-dispute notification |
| BIN Intelligence | Card networks | Card risk scoring, country of issuance |
| Device Reputation | Third-party SDK | Known bad devices, emulator detection |
| IP Intelligence | MaxMind/similar | Proxy, VPN, datacenter detection |
| Consortium Data | Industry shared | Cross-merchant fraud patterns |
Guardrails Against Over-Complexityβ
The roadmap explicitly avoids:
- Deploying highly complex or opaque models without clear interpretability paths
- Adding model variants that materially increase operational risk without proportional fraud/loss benefit
- Creeping scope into generic "AI everywhere" without clear problem statements
- Replacing human judgment in edge cases where model confidence is low
ML remains a tool, not the centerpiece - the platform's value is in robust decisioning, evidence, and economics, with ML reinforcing (not replacing) those pillars.
Graph-Based Fraud Ring Detectionβ
Entity Graph:
Nodes: Cards, Devices, Users, IPs
Edges: Transaction relationships
Fraud Ring Indicators:
- Cluster of cards sharing devices
- Star pattern (one device, many cards)
- Circular payments between accounts
- Velocity spikes in connected subgraph
Implementation:
- Neo4j for graph storage
- Graph embedding for ML features
- Community detection for ring identification
Architecture for ML Integrationβ
Current Architecture (ML-Ready)β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Layer β
β (FastAPI) β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
βFeature Engine β β Detection β β Policy Engine β
β (Redis) β β Engine β β (YAML) β
βββββββββ¬ββββββββ βββββββββ¬ββββββββ βββββββββ¬ββββββββ
β β β
β βββββββββ΄ββββββββ β
β β Currently β β
β β Rule-Based β β
β β β β
β β [ML HOOK] ββββββββββββββ
β β Phase 2 β
β βββββββββββββββββ
β
βΌ
βββββββββββββββββ
βEvidence Vault β
β(Feature Store)β
βββββββββββββββββ
Phase 2 Architecture (With ML)β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Layer β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
βFeature Engine β β Scoring β β Policy Engine β
β (Redis) β β Service β β (YAML) β
βββββββββ¬ββββββββ βββββββββ¬ββββββββ βββββββββ¬ββββββββ
β β β
β ββββββββββββΌβββββββββββ β
β βΌ βΌ βΌ β
β ββββββββββ ββββββββββ ββββββββββ β
β β Rule β β ML β βEnsembleβ β
β β Engine β β Model β β Layer βββββ
β ββββββββββ ββββββββββ ββββββββββ
β β
β ββββββββββββ΄βββββββββββ
β βΌ βΌ
β ββββββββββ ββββββββββ
β βChampionβ βChallngrβ
β β Model β β Model β
β ββββββββββ ββββββββββ
β
βΌ
βββββββββββββββββ
βEvidence Vault β
β+ ML Features β
βββββββββββββββββ
Ensemble Scoringβ
class EnsembleScoringService:
"""Combine rule-based and ML scores."""
def score(self, features: dict) -> RiskScore:
# Rule-based score (always runs)
rule_score = self.rule_engine.score(features)
# ML score (Phase 2+)
if self.ml_enabled:
ml_score = self.ml_model.predict(features)
else:
ml_score = None
# Ensemble combination
if ml_score is not None:
# Weighted average with rules as safety net
combined = (
ml_score * 0.70 + # ML carries more weight
rule_score * 0.30 # Rules as backstop
)
# Hard overrides (rules always win for certain signals)
if features.get("is_emulator"):
combined = max(combined, 0.95)
if features.get("blocklist_match"):
combined = 1.0
else:
combined = rule_score
return RiskScore(
combined=combined,
rule_score=rule_score,
ml_score=ml_score,
model_version=self.model_version
)
Timeline Summaryβ
| Phase | Scope | Status | Timeline |
|---|---|---|---|
| Phase 1 | Rule-based MVP | Complete | Done |
| Phase 2a | ML model training | Not started | Weeks 1-2 |
| Phase 2b | Champion/challenger | Not started | Weeks 2-3 |
| Phase 2c | ML in production | Not started | Week 4+ |
| Phase 3 | Advanced ML | Future | TBD |
Interview Applicationβ
When asked "What is the AI/ML roadmap?":
"The current implementation is rule-based by design - we prioritized getting to market fast with interpretable decisions. But the architecture is ML-ready: features are captured in the evidence vault, the scoring service has a clean interface for model integration, and we have the replay framework for validation.
Phase 2 introduces a simple ML model - XGBoost for criminal fraud - using 25+ features from velocity counters and entity profiles. Labels come from chargebacks with a 120-day maturity window. We will deploy via champion/challenger: 80% to the proven rules, 15% to the ML challenger, 5% holdout.
Key to success is the ensemble approach: ML informs, but rules have hard overrides for signals like emulators or blocklist matches. This keeps the system interpretable for compliance while improving detection accuracy.
Retraining is weekly and automated, but promotion requires 14 days of data and statistically significant improvement before we graduate a challenger to champion."
This document consolidates the AI/ML strategy with explicit current status, avoiding the impression that ML is already deployed when it is planned for Phase 2.