My Tryst with Out of Memory (OOM) Error: Taming High-Volume ML Pipelines on Limited Hardware

 How I Fixed Memory Bloat in a Prefect-Orchestrated Workflow Without RAM Upgrades

The Breaking Point

There I was—wrapping up Module 3 of DataTalksClub’s #mlopszoomcamp. My mission: orchestrate an ML pipeline for a high-volume dataset using Prefect, with MLflow tracking experiments. Everything ran smoothly until...

The dreaded Out of Memory (OOM) error struck during data transformation. With millions of records and high-cardinality features, my 16GB RAM AWS EC2 instance was gasping. Here’s the twist: I refused to add RAM. Why? To simulate real-world constraints of local machines/laptops where hardware upgrades aren't always possible or where budget provisions for scaling up compute in cloud virtual machines are not approved readily.


The Hardware Gambit

  • Initial EC2 setup: 16GB RAM + 30GB storage (barely more than a typical laptop)

  • Deliberate constraint: Avoided RAM upgrade to mirror local-dev limitations

  • Storage expansion: Added 20GB storage (total 50GB) to create breathing room

  • Swap strategy: Allocated 30GB of this for swap file virtual memory


2 Battle-Tested Fixes + The Swap Revelation

  1. Surgical Data Loading
    Problem: 
    Loading all columns wasted memory
    Fix: Select only essential features upfront:
    df = pd.read_parquet('data.parquet', columns=['required_feature(s)', 'target'])

    
    2. Aggressive Memory Cleanup

          Problem: Lingering Dataframes choked RAM.

          Fix: Purge intermediates & no longer to be used structures after preprocessing but before vectorizing:    

           del df  # Execute before deriving encoding/vectorization   

        


      3. Swap File Optimization

         Phase 1: Proof of Concept

         Created 30GB swap file on expanded storage:

            sudo fallocate -l 30G /swapfile  
         sudo chmod 600 /swapfile  
          sudo mkswap /swapfile  
          sudo swapon /swapfile  
      
         Result: Pipeline ran successfully (without Prefect) using 16GB RAM + 13GB swap


        Phase 2: Right-Sizing

        Replaced with 16GB swap to conserve storage:

          sudo swapoff /swapfile && sudo rm /swapfile # Remove old

          free -h check the new swap file created

    Final Test: Prefect-orchestrated workflow ran flawlessly with 16GB RAM + 16GB swap


Why This Matters for Local Development   

  1. Real-world simulation:

    • 30GB storage mimics limited laptop SSD space

    • Fixed RAM mirrors consumer hardware constraints

  2. The swap insight:
            

  3.  Resource tradeoffs:
  • Used 30% storage for swap to avoid 100% RAM cost

  • Achieved 32GB effective memory (16+16GB swap used) for heavy transformations and training

     

The Victory Lap

Results:
  • Prefect workflows executed end-to-end with OOM conquered

  • MLflow logged artifacts reliably despite memory pressure

  • Models trained on 3M+ records using storage as memory


Golden Insight:

"When RAM is finite, treat storage as your emergency oxygen tank – but measure exactly how much air you need before diving into production."



Call to Action:

Pushed your hardware to its limits? Share your creative workarounds below!


ResourceInitialOptimizedPurpose
RAM16GB16GBFixed constraint
Storage30GB50GBExpanded for swap
Swap FileNone30GB → 16GBRight-sized virtual memory


The revised version emphasizes our deliberate constraints, showcases our iterative problem-solving, and positions our solution as universally applicable to local development scenarios.






Comments

Popular posts from this blog

My midterm project at MLZoomcamp led by Alexey Grigorov for DataTalksClub

My Capstone 1 Project at MLZoomcamp: Bird Species Classification with Deep Learning

Starting my Data Engineering journey with a foundational insight on Docker, Terraform and Google Cloud Platform