My Tryst with Out of Memory (OOM) Error: Taming High-Volume ML Pipelines on Limited Hardware

 How I Fixed Memory Bloat in a Prefect-Orchestrated Workflow Without RAM Upgrades

The Breaking Point

There I was—wrapping up Module 3 of DataTalksClub’s #mlopszoomcamp. My mission: orchestrate an ML pipeline for a high-volume dataset using Prefect, with MLflow tracking experiments. Everything ran smoothly until...

The dreaded Out of Memory (OOM) error struck during data transformation. With millions of records and high-cardinality features, my 16GB RAM AWS EC2 instance was gasping. Here’s the twist: I refused to add RAM. Why? To simulate real-world constraints of local machines/laptops where hardware upgrades aren't always possible or where budget provisions for scaling up compute in cloud virtual machines are not approved readily.


The Hardware Gambit

  • Initial EC2 setup: 16GB RAM + 30GB storage (barely more than a typical laptop)

  • Deliberate constraint: Avoided RAM upgrade to mirror local-dev limitations

  • Storage expansion: Added 20GB storage (total 50GB) to create breathing room

  • Swap strategy: Allocated 30GB of this for swap file virtual memory


2 Battle-Tested Fixes + The Swap Revelation

  1. Surgical Data Loading
    Problem: 
    Loading all columns wasted memory
    Fix: Select only essential features upfront:
    df = pd.read_parquet('data.parquet', columns=['required_feature(s)', 'target'])

    
    2. Aggressive Memory Cleanup

          Problem: Lingering Dataframes choked RAM.

          Fix: Purge intermediates & no longer to be used structures after preprocessing but before vectorizing:    

           del df  # Execute before deriving encoding/vectorization   

        


      3. Swap File Optimization

         Phase 1: Proof of Concept

         Created 30GB swap file on expanded storage:

            sudo fallocate -l 30G /swapfile  
         sudo chmod 600 /swapfile  
          sudo mkswap /swapfile  
          sudo swapon /swapfile  
      
         Result: Pipeline ran successfully (without Prefect) using 16GB RAM + 13GB swap


        Phase 2: Right-Sizing

        Replaced with 16GB swap to conserve storage:

          sudo swapoff /swapfile && sudo rm /swapfile # Remove old

          free -h check the new swap file created

    Final Test: Prefect-orchestrated workflow ran flawlessly with 16GB RAM + 16GB swap


Why This Matters for Local Development   

  1. Real-world simulation:

    • 30GB storage mimics limited laptop SSD space

    • Fixed RAM mirrors consumer hardware constraints

  2. The swap insight:
            

  3.  Resource tradeoffs:
  • Used 30% storage for swap to avoid 100% RAM cost

  • Achieved 32GB effective memory (16+16GB swap used) for heavy transformations and training

     

The Victory Lap

Results:
  • Prefect workflows executed end-to-end with OOM conquered

  • MLflow logged artifacts reliably despite memory pressure

  • Models trained on 3M+ records using storage as memory


Golden Insight:

"When RAM is finite, treat storage as your emergency oxygen tank – but measure exactly how much air you need before diving into production."



Call to Action:

Pushed your hardware to its limits? Share your creative workarounds below!


ResourceInitialOptimizedPurpose
RAM16GB16GBFixed constraint
Storage30GB50GBExpanded for swap
Swap FileNone30GB → 16GBRight-sized virtual memory


The revised version emphasizes our deliberate constraints, showcases our iterative problem-solving, and positions our solution as universally applicable to local development scenarios.






Comments

Popular posts from this blog

My midterm project at MLZoomcamp led by Alexey Grigorov for DataTalksClub

Starting my Data Engineering journey with a foundational insight on Docker, Terraform and Google Cloud Platform

Logistic Regression: A walkthrough by Alexey Grigorev