Skip to main content

AI/ML Roadmap & Current Status

Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST


Current Implementation Status​

Phase 1: Rule-Driven Detection (COMPLETE)​

The MVP implementation uses a rule-based detection engine with hooks for ML integration. This was a deliberate design choice to:

  1. Deliver value faster - Rules can be tuned immediately without training data
  2. Ensure interpretability - Every decision has explainable reasons
  3. Establish infrastructure - Feature pipeline, evidence capture, and policy engine ready for ML

What Is Implemented​

ComponentStatusDescription
Feature EngineCompleteRedis velocity counters with sliding windows
Detection EngineComplete5 detector types (card testing, velocity, geo, bot, friendly)
Risk ScoringCompleteRule-based combination of detector signals
Policy EngineCompleteYAML configuration with hot-reload
Evidence VaultCompleteImmutable storage with feature snapshots
Metrics PipelineCompletePrometheus metrics for all components
Load TestingCompleteValidated 1000+ RPS at 106ms P99

Detection Logic (Current)​

# Simplified scoring formula (rule-based)
criminal_score = max(
card_testing.confidence * 0.9, # Card testing patterns
velocity.confidence * 0.8, # Velocity rule triggers
geo_anomaly.confidence * 0.7, # Geographic issues
bot_detection.confidence * 0.95 # Automation signals
)

friendly_score = friendly_fraud.confidence * 0.6

# Policy thresholds (configurable)
if criminal_score >= 0.85 or friendly_score >= 0.95:
return BLOCK
elif criminal_score >= 0.60 or friendly_score >= 0.70:
return FRICTION
elif criminal_score >= 0.40 or friendly_score >= 0.50:
return REVIEW
else:
return ALLOW

Phase 2: Hybrid ML + Rules​

ML Model Specification​

Criminal Fraud Model​

AttributeSpecification
AlgorithmXGBoost (primary), LightGBM (challenger)
ObjectiveBinary classification (is_criminal_fraud)
Training Window90 days of transactions with 120-day label maturity
Retraining FrequencyWeekly (automated pipeline)
Feature Count25+ features
Target AUC>0.85
Latency Budget<25ms P99

Feature List​

Velocity Features (Real-time from Redis):

FeatureDescriptionWindow
card_attempts_10mTransaction attempts on card10 min
card_attempts_1hTransaction attempts on card1 hour
card_attempts_24hTransaction attempts on card24 hours
device_distinct_cards_1hUnique cards on device1 hour
device_distinct_cards_24hUnique cards on device24 hours
ip_distinct_cards_1hUnique cards from IP1 hour
user_total_amount_24hTotal spend by user24 hours
card_decline_rate_1hDecline rate for card1 hour

Entity Features (From profiles):

FeatureDescriptionSource
card_age_hoursTime since card first seenRedis
device_age_hoursTime since device first seenRedis
user_account_age_daysAccount creation ageProfile
user_chargeback_count_lifetimeHistorical chargebacksProfile
user_chargeback_rate_90dRecent chargeback rateProfile
card_distinct_devices_30dDevices using this cardRedis
card_distinct_users_30dUsers using this cardRedis

Transaction Features (From event):

FeatureDescriptionComputation
amount_usdTransaction amountDirect
amount_zscoreAmount vs user average(amount - avg) / std
is_new_card_for_userFirst time card usedBoolean
is_new_device_for_userFirst time device usedBoolean
hour_of_dayLocal time hourTimezone adjusted
is_weekendWeekend flagBoolean

Device/Network Features:

FeatureDescriptionSource
is_emulatorDevice is emulatedFingerprint
is_rootedDevice is rooted/jailbrokenFingerprint
is_datacenter_ipIP from cloud providerIP intelligence
is_vpnVPN detectedIP intelligence
is_torTor exit nodeIP intelligence
ip_risk_scoreThird-party IP scoreExternal API

Label Source​

Positive Labels (Fraud):

  • Confirmed fraud chargebacks (reason codes 10.1-10.5)
  • TC40/SAFE issuer alerts with fraud type indicators
  • Manual review disposition: "confirmed fraud"

Negative Labels (Legitimate):

  • Authorizations without chargebacks or fraud alerts after 120-day aging
  • Manual review disposition: "legitimate"

Exclusions (Not Used for Training):

  • Partial refunds and service complaints
  • Ambiguous disputes (friendly fraud candidates)
  • Transactions with incomplete feature capture
Label Maturity: 120 days from transaction
- Reason: Chargebacks can arrive up to 120 days post-transaction
- Consequence: Training data is always 4 months behind
- Mitigation: Use issuer alerts (TC40/SAFE) for earlier signal

Retraining Strategy​

Weekly Pipeline (Automated):
1. Extract transactions from T-120d to T-30d
2. Join with chargeback outcomes
3. Retrieve point-in-time features from evidence vault
4. Train new model version
5. Validate against holdout (last 7 days)
6. If AUC drop < 2%: Register as challenger
7. If AUC drop >= 2%: Alert DS team, use previous model

Monthly Review (Manual):
1. Compare champion vs challenger performance
2. Analyze feature importance drift
3. Review false positive cases
4. Decide: promote challenger or retrain with adjustments

Champion/Challenger Framework​

Experiment Architecture​

Traffic Routing:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Load Balancer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Champion β”‚ β”‚Challengerβ”‚ β”‚ Holdout β”‚
β”‚ (80%) β”‚ β”‚ (15%) β”‚ β”‚ (5%) β”‚
β”‚ Model A β”‚ β”‚ Model B β”‚ β”‚Rules Onlyβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Routing: Deterministic hash on auth_id (reproducible)

Experiment Metrics​

MetricChampionChallengerThreshold
Approval Rate91.2%Must be within 1%-1% to +2%
Fraud Rate (30d lag)1.15%Must improve<1.15%
P99 Latency106msMust be within 20%<127ms
False Positive Rate12%Must improve<12%

Promotion Criteria​

Promote Challenger if ALL true:
1. Running for >= 14 days
2. Sample size >= 100,000 transactions
3. Fraud rate improved by >= 5% (statistically significant)
4. Approval rate within 1% of champion
5. No latency degradation
6. No anomalies in score distribution

Rollback Challenger if ANY true:
1. Fraud rate increased by > 10%
2. Approval rate dropped by > 3%
3. P99 latency exceeded 150ms
4. Error rate exceeded 0.5%

Replay Framework​

Purpose​

Historical replay enables:

  1. Threshold simulation - Test new thresholds on historical data
  2. Model validation - Compare model predictions to known outcomes
  3. Policy change estimation - Quantify impact before deployment

Implementation​

async def replay(
start_date: datetime,
end_date: datetime,
policy_config: Optional[dict] = None,
model_version: Optional[str] = None
) -> ReplayResults:
"""
Replay historical transactions with optional config changes.

Key: Uses point-in-time features from evidence vault,
NOT current features (which would cause look-ahead bias).
"""

for transaction in get_historical_transactions(start_date, end_date):
# Get features AS THEY WERE at transaction time
features = get_features_at_time(
transaction.auth_id,
transaction.timestamp
)

# Score with specified model/policy
new_decision = score_and_decide(
transaction,
features,
model_version,
policy_config
)

# Compare to actual outcome
actual_fraud = was_transaction_fraud(transaction.auth_id)

# Record for analysis
results.append({
"original_decision": transaction.original_decision,
"new_decision": new_decision,
"actual_fraud": actual_fraud
})

return analyze_results(results)

Simulation Use Cases​

Use CaseInputOutput
Threshold changeNew threshold valuesApproval rate delta, fraud caught delta
Model comparisonModel version A vs BAUC difference, FP rate difference
Rule additionNew rule definitionTransactions affected, score changes
Seasonal analysisDate range comparisonPattern differences by period

Phase 3: Advanced ML (Future)​

Planned Enhancements​

EnhancementTimelineDescription
Graph Neural NetworkPhase 3Detect fraud rings via card-device-user connections
Sequence ModelPhase 3LSTM/Transformer for transaction sequence patterns
Anomaly DetectionPhase 3Isolation Forest for unknown attack patterns
Real-time RetrainingPhase 3Online learning for rapid adaptation
External SignalsPhase 3TC40/SAFE, Ethoca, BIN intelligence, device reputation

External Signal Integration (Phase 3)​

SignalSourceUse Case
TC40/SAFEIssuer alertsEarly fraud signal before chargeback
Ethoca/VerifiNetwork alertsPre-dispute notification
BIN IntelligenceCard networksCard risk scoring, country of issuance
Device ReputationThird-party SDKKnown bad devices, emulator detection
IP IntelligenceMaxMind/similarProxy, VPN, datacenter detection
Consortium DataIndustry sharedCross-merchant fraud patterns

Guardrails Against Over-Complexity​

The roadmap explicitly avoids:

  • Deploying highly complex or opaque models without clear interpretability paths
  • Adding model variants that materially increase operational risk without proportional fraud/loss benefit
  • Creeping scope into generic "AI everywhere" without clear problem statements
  • Replacing human judgment in edge cases where model confidence is low

ML remains a tool, not the centerpiece - the platform's value is in robust decisioning, evidence, and economics, with ML reinforcing (not replacing) those pillars.

Graph-Based Fraud Ring Detection​

Entity Graph:
Nodes: Cards, Devices, Users, IPs
Edges: Transaction relationships

Fraud Ring Indicators:
- Cluster of cards sharing devices
- Star pattern (one device, many cards)
- Circular payments between accounts
- Velocity spikes in connected subgraph

Implementation:
- Neo4j for graph storage
- Graph embedding for ML features
- Community detection for ring identification

Architecture for ML Integration​

Current Architecture (ML-Ready)​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ API Layer β”‚
β”‚ (FastAPI) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Feature Engine β”‚ β”‚ Detection β”‚ β”‚ Policy Engine β”‚
β”‚ (Redis) β”‚ β”‚ Engine β”‚ β”‚ (YAML) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Currently β”‚ β”‚
β”‚ β”‚ Rule-Based β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ [ML HOOK] β”‚β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ Phase 2 β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Evidence Vault β”‚
β”‚(Feature Store)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 2 Architecture (With ML)​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ API Layer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Feature Engine β”‚ β”‚ Scoring β”‚ β”‚ Policy Engine β”‚
β”‚ (Redis) β”‚ β”‚ Service β”‚ β”‚ (YAML) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Rule β”‚ β”‚ ML β”‚ β”‚Ensembleβ”‚ β”‚
β”‚ β”‚ Engine β”‚ β”‚ Model β”‚ β”‚ Layer β”‚β—„β”€β”€β”˜
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β–Ό β–Ό
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚Championβ”‚ β”‚Challngrβ”‚
β”‚ β”‚ Model β”‚ β”‚ Model β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Evidence Vault β”‚
β”‚+ ML Features β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Ensemble Scoring​

class EnsembleScoringService:
"""Combine rule-based and ML scores."""

def score(self, features: dict) -> RiskScore:
# Rule-based score (always runs)
rule_score = self.rule_engine.score(features)

# ML score (Phase 2+)
if self.ml_enabled:
ml_score = self.ml_model.predict(features)
else:
ml_score = None

# Ensemble combination
if ml_score is not None:
# Weighted average with rules as safety net
combined = (
ml_score * 0.70 + # ML carries more weight
rule_score * 0.30 # Rules as backstop
)

# Hard overrides (rules always win for certain signals)
if features.get("is_emulator"):
combined = max(combined, 0.95)
if features.get("blocklist_match"):
combined = 1.0
else:
combined = rule_score

return RiskScore(
combined=combined,
rule_score=rule_score,
ml_score=ml_score,
model_version=self.model_version
)

Timeline Summary​

PhaseScopeStatusTimeline
Phase 1Rule-based MVPCompleteDone
Phase 2aML model trainingNot startedWeeks 1-2
Phase 2bChampion/challengerNot startedWeeks 2-3
Phase 2cML in productionNot startedWeek 4+
Phase 3Advanced MLFutureTBD

Interview Application​

When asked "What is the AI/ML roadmap?":

"The current implementation is rule-based by design - we prioritized getting to market fast with interpretable decisions. But the architecture is ML-ready: features are captured in the evidence vault, the scoring service has a clean interface for model integration, and we have the replay framework for validation.

Phase 2 introduces a simple ML model - XGBoost for criminal fraud - using 25+ features from velocity counters and entity profiles. Labels come from chargebacks with a 120-day maturity window. We will deploy via champion/challenger: 80% to the proven rules, 15% to the ML challenger, 5% holdout.

Key to success is the ensemble approach: ML informs, but rules have hard overrides for signals like emulators or blocklist matches. This keeps the system interpretable for compliance while improving detection accuracy.

Retraining is weekly and automated, but promotion requires 14 days of data and statistically significant improvement before we graduate a challenger to champion."


This document consolidates the AI/ML strategy with explicit current status, avoiding the impression that ML is already deployed when it is planned for Phase 2.