STREAM // Architecture

Pipeline: Prefect Cloud

Prefect over Airflow: Python-native (no YAML DAGs), dynamic workflows, first-class Python support, better for small teams. Flows are just decorated Python functions. Artifacts track training metrics. S3 blocks provide cloud-agnostic storage.

Model Serving: R2 as Model Registry

Cloudflare R2 (S3-compatible) stores serialized models. On app startup, models are loaded into memory. Hot-reload endpoint allows updating without restart. This is the 'model registry' pattern — simple, effective, no MLflow needed for this scale.

Integration Pattern

This Python ML service exposes a JSON API. Backend services call the scoring endpoint via HTTP. In production, this would go through a message queue (AMQP): transaction published to scoring queue -> ML service consumes, scores, publishes result -> backend picks up risk score.

Monitoring & Audit

Every prediction is logged to Postgres with: model version, input hash (SHA256, not raw features for privacy), risk score, threshold used, top SHAP features, LLM narrative, inference time. Regulators can reconstruct any flagging decision by prediction ID.

Scaling Considerations

10x volume: Add Redis caching for repeated feature patterns, batch scoring endpoint. 100x volume: Separate scoring service behind load balancer, async processing via AMQP, feature store for shared computation, model serving via TorchServe/Triton for GPU models.