Model Deployment in a MLOps Workflow: The Various Ways

June 21, 2025

In MLOps pipelines, deployment is the pivotal phase where machine learning models transform from development artifacts into production-ready assets. The MLOps Zoomcamp Module 4: Deployment outlines three primary deployment strategies:

1. Web-services: Flask + Docker 🐍

Flask app loads model artifacts from local disk or cloud storage
Containerization ensures identical environments across dev/prod
Key Course Tool: Docker for dependency isolation

2. Web-services: MLflow Model Registry 📦

Fetch models using mlflow.pyfunc.load_model() with registry URIs (e.g., models:/churn-model/Production)
No hardcoded paths – models update without redeploying app
Note: MLflow's built-in REST server (mlflow models serve) is an alternative to Flask

3. Streaming: AWS Kinesis + Lambda ⚡

Lambda downloads model from S3 (not directly from MLflow registry)
Max execution time: 15 minutes (critical for large models)
Course Pattern: Kinesis → Lambda → S3/DynamoDB

4. Batch Scoring with Prefect ⏱️

When real-time isn't required: Batch scoring shines for periodic, high-volume predictions. Prefect orchestrates these workflows like a conductor – managing execution timing, failure recovery, and progress tracking while your Python code handles the heavy lifting:

Model Loading
Retrieve production-ready models directly from cloud storage (S3 in course examples)
Data Chunk Processing
Efficiently score large datasets in parallelizable batches
Result Export
Save predictions to databases, data lakes, or BI systems

The Prefect Advantage:

🔄 Zero redeploys: Update scoring logic without infrastructure changes
⚡ Elastic scaling: Handles terabyte-scale datasets on your infrastructure
📅 Intelligent scheduling: Daily/weekly runs with built-in retries

Search This Blog

Learning from Zoomcamps at DataTalks.Club