My midterm project at MLZoomcamp led by Alexey Grigorov for DataTalksClub

November 25, 2024

Predicting Patient No-Shows: A Data-Driven Approach

Hospital no-shows significantly disrupt healthcare systems, wasting resources and delaying care for those in need. My midterm project for the MLZoomcamp, led by Alexey Grigorev and hosted by DataTalksClub, tackles this challenge using machine learning to predict no-show probabilities for appointments in Brazilian hospitals. Here's how I approached the problem:

The Challenge

The dataset, sourced from Kaggle, includes over 110,000 appointments and diverse features such as patient demographics, appointment details, and medical history. However, achieving reliable predictions is complex due to:

Imbalanced Data: About 80% of appointments were attended, while 20% were no-shows.

Dependence on Feature Engineering: Key predictors like patient history (previous/missed appointments) were engineered from the raw data.

Bias Mitigation: Socioeconomic factors, such as neighborhood, required careful handling to ensure fairness.

The Solution

Through rigorous data cleaning, feature engineering, and hyperparameter tuning, I built and evaluated several models, including Logistic Regression, Random Forest, and XGBoost. Here’s why XGBoost emerged as the best:

Accuracy: It achieved an AUC of 0.748, outperforming other models in predicting no-shows.

Fairness: I ensured the model emphasized medical and appointment-specific factors over potentially biased features.

Key Insights

Lead Time Matters: The days between scheduling and appointment strongly correlate with no-shows.
Patient History is Key: Patterns of previous and missed appointments enhance prediction accuracy.
Fairness-Driven Engineering: Assessing features for unintended bias was crucial for a balanced model.

Practical Utility

This model can serve as a resource for hospitals to:

Identify high-risk appointments early.
Optimize resources by managing no-show probabilities.
Enhance patient care with targeted reminders or interventions.

My journey through this project reflects the potential of machine learning to solve real-world problems. If you're intrigued, feel free to explore the code and methodology in my GitHub repository.

Search This Blog

Learning from Zoomcamps at DataTalks.Club

My midterm project at MLZoomcamp led by Alexey Grigorov for DataTalksClub

Predicting Patient No-Shows: A Data-Driven Approach

Comments

Post a Comment

Popular posts from this blog

Starting my Data Engineering journey with a foundational insight on Docker, Terraform and Google Cloud Platform

Logistic Regression: A walkthrough by Alexey Grigorev