Monitoring in MLOPS: Reflections on Module 5 of the MLOps Zoomcamp

In today’s fast‑paced ML landscape, deploying a model is only half the story. Continuous monitoring ensures your system stays healthy, accurate, and reliable long after go‑live. I’ve just wrapped up Module 5: Model Monitoring in the DataTalksClub MLOps Zoomcamp, and here’s what I learned—and how you can apply it to your own projects.


Why Model Monitoring Matters

  • Drift Detection: Data distributions evolve. What trained your model yesterday may not reflect today’s reality.

  • Quality Assurance: Spot issues like missing values or unexpected outliers before they impact end‑users.

  • Reliability & Trust: Stakeholders need confidence that your predictions remain valid and service levels remain high.


Core Components of the Monitoring Stack


1. Docker Compose Services

  • PostgreSQL for storing time‑series metrics

  • Adminer for lightweight database management

  • Grafana for rich, interactive dashboards
    Spinning these up with a single docker-compose up command made the setup a breeze.


2. Evidently for Data & Concept Drift

  • Compute drift metrics (e.g., feature‑level drift on model predictions)

  • Monitor data quality (missing values, data ranges, quantiles)

  • Generate ad‑hoc test suites and reports to debug anomalies on demand 


3. Prefect for Automation

  • Automate batch metric collection at regular intervals

  • Orchestrate data loading, metric computation, and database writes in a managed flow
    (In this module, Prefect was used for demonstration—keep in mind it’s optional in some course editions.)


4. Grafana Dashboards

  • Pre‑configured panels visualize your metrics over time:

    • Missing‑Value Counts

    • Data Drift Scores

    • Quantile Trends (e.g., median fare amount)

  • Dashboards are exported as JSON and saved under 05-monitoring/dashboards/ for version control and easy reloads.


 Step‑by‑Step Workflow

1. Prepare Data

  • Train a baseline model and generate a reference dataset.

  • Simulate “current” batches (e.g., sliding daily or monthly windows).


2. Compute Metrics

  • Run a Python script to calculate Evidently metrics in a loop (every 10 seconds for demo, representing daily batches).

  • Insert results into PostgreSQL.

3. Visualize & Alert

  • Open Grafana at localhost:3000 (default admin/admin credentials).

  • Browse the “Home → New Dashboard” folder for the pre‑built monitoring dashboard.

  • Review panels for drift, data quality, and test failures.

4. Debug on Demand

  • Use the ad‑hoc debugging_nyc_taxi_data.ipynb notebook to run Evidently TestSuites.

  • Drill into unexpected metric spikes or failing tests for root‑cause analysis.



Key Takeaways

  • Modular Architecture: Decouple metric computation (Evidently) from storage (PostgreSQL) and visualization (Grafana).

  • Automation Is Crucial: Even a simple Prefect flow ensures metrics are fresh and consistent.

  • Version‑Controlled Dashboards: Saving dashboard JSON alongside code makes reproducibility and collaboration seamless.

  • Proactive Debugging: Integrating TestSuites lets you catch issues before they cascade into production incidents.

Comments

Popular posts from this blog

My Capstone 1 Project at MLZoomcamp: Bird Species Classification with Deep Learning

Model Deployment in a MLOps Workflow: The Various Ways

My midterm project at MLZoomcamp led by Alexey Grigorov for DataTalksClub