Skip to main content

Results, Limitations & Personas

Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST


Load Test Resultsโ€‹

Test Configurationโ€‹

ParameterValue
EnvironmentLocal (M-series Mac)
API Workers4 uvicorn workers
RedisSingle node, local Docker
PostgreSQLSingle node, local Docker
Test ToolLocust
Duration2 minutes
Users50 concurrent

Observed Performanceโ€‹

MetricObservedTargetStatus
Throughput260 RPS-Baseline
P50 Latency22ms50ms56% buffer
P99 Latency106ms200ms47% buffer
Error Rate0.00%<0.1%Passing
Failures00Passing

Latency Breakdownโ€‹

Total P99: 106ms
โ”œโ”€โ”€ Feature computation (Redis): ~50ms (47%)
โ”œโ”€โ”€ Risk scoring (detection): ~20ms (19%)
โ”œโ”€โ”€ Policy evaluation: ~10ms (9%)
โ”œโ”€โ”€ Evidence capture (async): ~20ms (19%)
โ””โ”€โ”€ Network/serialization: ~6ms (6%)

Key Insight: Redis velocity lookups dominate latency at 47% of total. At scale, this is the first optimization target.

Capacity Projectionโ€‹

Load LevelEst. RPSEst. P99BottleneckMitigation
Baseline (50 users)260106msNone-
2x (100 users)500130msAPI workersAdd workers
4x (200 users)900160msRedis connectionsConnection pooling
8x (400 users)1,500200msRedis throughputRedis Cluster
16x+ (1000 users)3,000+>200msArchitecture limitKafka + Flink

Identified Bottleneck: At ~4,000 RPS in ramp-up testing, PostgreSQL evidence writes saturated connection pool, increasing tail latency. This confirms the planned path to scale via sharding/replicas and event-sourced evidence ingestion.

Replay Validationโ€‹

Using synthetic historical data with known fraud labels:

ScenarioTransactionsFraud InjectedDetectedFalse Positives
Normal traffic10,0001% (100)72/100180/9,900
Card testing attack1,00010% (100)94/10045/900
Velocity attack50020% (100)88/10022/400
Mixed realistic15,0002% (300)221/300195/14,700

Summary:

  • Detection rate: 72-94% depending on attack type
  • False positive rate: 1.3-5% depending on scenario
  • Card testing attacks have highest detection confidence
  • Velocity attacks show strong detection with rule-based approach

Policy Impact Simulation (Replay)โ€‹

Comparing baseline rules vs platform policy on 1M synthetic transactions:

MetricBaseline RulesPlatform PolicyDelta
Approval rate89.0%91.5%+2.5%
Criminal fraud caught (recall)60%78%+18%
Criminal fraud passed40%22%-18%
Manual review rate4.2%2.6%-1.6%
Estimated fraud loss100% (baseline)~62%-38%

The replay engine and economics framework provide finance and risk partners a quantified view of trade-offs before changes are deployed.


Limitationsโ€‹

Infrastructure Limitationsโ€‹

LimitationImpactProduction Path
Single node architectureNo failover, limited throughputDeploy Redis Cluster, PostgreSQL replicas
Local Docker deploymentNot representative of cloud latencyDeploy to AWS/GCP with network testing
No load balancerSingle point of failureAdd ALB/NLB with health checks
No auto-scalingCannot handle traffic spikesImplement Kubernetes HPA
No multi-regionGeographic latency, DR riskDeploy to multiple regions

Data Limitationsโ€‹

LimitationImpactMitigation Path
Synthetic test dataMay not reflect real attack patternsShadow deployment on production traffic
No real chargebacksCannot validate label accuracyIntegrate with PSP chargeback feed
Limited feature diversityMay miss real fraud signalsAdd external signals (BIN, device reputation)
No historical baselineCannot compare to existing systemRun parallel with current fraud system
Point-in-time features untestedReplay may have leakageValidate with known delayed labels

Model Limitationsโ€‹

LimitationImpactMitigation Path
Rule-based onlyLower accuracy than MLPhase 2 ML integration
No adaptive thresholdsStatic rules do not evolveImplement threshold optimization
No feedback loopDecisions do not improve systemAdd analyst feedback to training
Single modelNo redundancy or comparisonChampion/challenger framework
No drift detectionModel may degrade silentlyImplement PSI monitoring

Operational Limitationsโ€‹

LimitationImpactMitigation Path
No analyst UIManual review is cumbersomeBuild case management dashboard
No bulk operationsCannot act on patterns efficientlyAdd bulk blocklist/threshold tools
Limited alertingMay miss issuesFull Alertmanager integration
No on-call runbooksIncident response unclearDocument response procedures
No disaster recoverySingle region failure = outageMulti-region active-passive

Honest Assessmentโ€‹

What This Proves:
โœ“ Architecture meets latency requirements
โœ“ Detection logic catches known fraud patterns
โœ“ Evidence capture is comprehensive
โœ“ Policy engine is configurable
โœ“ System handles expected load

What This Does Not Prove:
โœ— Performance under real production traffic
โœ— Detection accuracy on real fraud (vs synthetic)
โœ— ML model performance (not yet implemented)
โœ— Operational readiness (no real incidents yet)
โœ— Economic impact (no real financial data)

Personas & Dashboard Usageโ€‹

Persona 1: Fraud Analystโ€‹

Role: Reviews flagged transactions, makes manual decisions, investigates patterns

Primary Dashboard Panels:

PanelPurposeKey Metrics
Review QueueTransactions needing manual decisionCount, age, priority
Decision DistributionCurrent system behaviorALLOW/FRICTION/REVIEW/BLOCK %
Recent High-RiskEmerging patternsTransactions with score >70%
Triggered ReasonsWhy transactions flaggedTop 10 triggered signals

Workflow:

1. Check Review Queue
โ””โ”€โ”€ Sort by priority (HIGH first)
โ””โ”€โ”€ Filter by amount (high value first)

2. For each case:
โ””โ”€โ”€ View transaction details (decision, scores, detectors fired, policy version)
โ””โ”€โ”€ Review triggered signals and feature snapshot
โ””โ”€โ”€ Check customer history
โ””โ”€โ”€ Make decision: APPROVE / DECLINE / ESCALATE
โ””โ”€โ”€ Annotate with disposition (confirmed fraud, friendly fraud, service issue)

3. Bulk actions:
โ””โ”€โ”€ Add device to blocklist
โ””โ”€โ”€ Add card to blocklist
โ””โ”€โ”€ Flag user for enhanced monitoring

4. End of shift:
โ””โ”€โ”€ Review queue age metrics
โ””โ”€โ”€ Ensure nothing >4h old

Key Decisions:

  • Accept/decline individual transactions
  • Add entities to blocklists
  • Escalate suspicious patterns to Risk Lead
  • Annotate cases with dispositions (feeds back into model training labels)

Persona 2: Risk Lead / Fraud Managerโ€‹

Role: Sets strategy, monitors KPIs, adjusts thresholds, manages team

Primary Dashboard Panels:

PanelPurposeKey Metrics
Approval Rate (24h)Customer experience healthTarget: >92%, Alert: <90%
Block Rate (24h)Fraud prevention activityTarget: <5%, Alert: >8%
Fraud Loss (30d lag)Actual financial impactRolling 30-day $
Dispute Win RateEvidence effectivenessTarget: >50%
Review Queue SLAOps efficiency% within 4h SLA

Workflow:

1. Morning Review:
โ””โ”€โ”€ Check 24h approval rate
โ””โ”€โ”€ Review any after-hours alerts
โ””โ”€โ”€ Compare block rate to baseline

2. Weekly Metrics Review:
โ””โ”€โ”€ Fraud rate trend (30d lag)
โ””โ”€โ”€ False positive estimate
โ””โ”€โ”€ Dispute outcomes
โ””โ”€โ”€ Threshold performance

3. Threshold Adjustment:
โ””โ”€โ”€ Run replay simulation on proposed change
โ””โ”€โ”€ Review projected impact
โ””โ”€โ”€ If acceptable: Apply via Policy Settings
โ””โ”€โ”€ Monitor for 48h post-change

4. Incident Response:
โ””โ”€โ”€ Spike in block rate? Check for attack or bug
โ””โ”€โ”€ Drop in approval rate? Check threshold misconfiguration
โ””โ”€โ”€ Latency spike? Escalate to Engineering

Key Decisions:

  • Threshold adjustments (friction/review/block levels)
  • Policy rule additions or modifications
  • Escalation to Engineering or Security
  • Resource allocation (analyst coverage)

Persona 3: SRE / On-Call Engineerโ€‹

Role: Maintains system reliability, responds to alerts, handles incidents

Primary Dashboard Panels:

PanelPurposeKey Metrics
P99 LatencySystem performanceTarget: <200ms, Alert: >150ms
Error RateSystem reliabilityTarget: <0.1%, Alert: >0.5%
Safe Mode StatusFallback stateNormal / SAFE MODE
Component HealthDependency statusRedis, PostgreSQL, API status
ThroughputTraffic volumeRPS vs expected baseline

Workflow:

1. Alert Response:
โ””โ”€โ”€ Check alert source and severity
โ””โ”€โ”€ Verify via dashboard (not just alert)
โ””โ”€โ”€ Follow runbook for specific alert type

2. Latency Spike Response:
โ””โ”€โ”€ Check Redis latency panel
โ””โ”€โ”€ Check PostgreSQL latency panel
โ””โ”€โ”€ Identify bottleneck component
โ””โ”€โ”€ Scale or restart as needed

3. Safe Mode Activation:
โ””โ”€โ”€ Automatic if error rate >5%
โ””โ”€โ”€ Manual if component failure detected
โ””โ”€โ”€ Notify Fraud Ops (decisions will be conservative)
โ””โ”€โ”€ Document reason and duration

4. Post-Incident:
โ””โ”€โ”€ Collect metrics from incident window
โ””โ”€โ”€ Write post-mortem
โ””โ”€โ”€ Update runbooks if needed

Key Alerts:

AlertThresholdResponse
FraudDecisionLatencyHighP99 >200ms for 2minCheck Redis, scale API
FraudErrorRateCritical>5% for 1minSafe mode, investigate
FraudSafeModeActiveAnyNotify stakeholders, investigate
FraudTrafficDrop<10 RPS for 5minCheck upstream integration
FraudTrafficSpike>2x baselineCheck for attack or event

Dashboard Mappingโ€‹

Demo Dashboard (dashboard.py) - Current Implementationโ€‹

TabPrimary PersonaKey Panels
Transaction SimulatorEngineer/DemoTest scenarios, attack presets
Analytics DashboardRisk LeadDecision distribution, latency charts
Decision HistoryFraud AnalystHistorical decisions with filters
Policy InspectorRisk LeadCurrent rules, thresholds, lists
Policy SettingsRisk LeadThreshold adjustment, rule management

Production Dashboard Needs (Gap Analysis)โ€‹

NeedDemo HasProduction Needs
Review queueNoYes - Priority sorted, age tracking
Case managementNoYes - Assignment, notes, workflow
Bulk actionsNoYes - Multi-select, batch operations
Real-time alertsNoYes - Integrated alerting
Drill-downLimitedYes - Click through to transaction
ExportNoYes - CSV/PDF for investigations
Role-based accessNoYes - Analyst vs Admin views

Interview Applicationโ€‹

When asked "How would you present results and limitations?":

"I would be transparent about what we have validated and what we have not. The load test shows we hit 260 RPS at 106ms P99 on local infrastructure - that is 47% headroom to our 200ms SLA. But that is a single-node setup on synthetic data.

Key limitations: we have not proven this works with real production traffic, real fraud patterns, or real chargebacks. The rule-based detection catches 72-94% of synthetic fraud depending on pattern type, but we need shadow deployment on real traffic to validate.

For dashboards, I would map them to personas: Fraud Analysts need the review queue and case tools, Risk Leads need the KPI panels and threshold controls, SRE needs the latency and health panels with alert integration. The demo dashboard covers the Risk Lead use case; we would need to build out the Analyst workflow for production.

The honest answer is: this proves the architecture works, but does not prove it works in production until we deploy it in shadow mode on real traffic."

When asked "What are the known gaps?":

"Three categories: infrastructure, data, and operational.

Infrastructure: single-node everything, no auto-scaling, no multi-region. The path is clear - Kubernetes with HPA, Redis Cluster, PostgreSQL replicas - but it is not built yet.

Data: synthetic test data does not reflect real attack evolution. We need shadow deployment on production traffic and integration with real chargeback feeds to validate detection accuracy.

Operational: no analyst case management UI, no bulk operations, no real on-call runbooks. The demo dashboard shows the concept, but a Fraud Analyst cannot do their job with it in production.

I would present this honestly to stakeholders: MVP proves the concept, but there is a clear list of work before production readiness."


This document provides an honest assessment of what the system proves and does not prove, mapping dashboards to real user personas and their workflows.