Skip to content

beotavalo/loan-elegibility-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Scikit-learn Comet.ml Prefect autopep8 Pylint pytest AWS

Flask Docker Terraform GitHub Actions

Loan Eligibility Prediction

This project is part of the MLOps Zoomcamp offered by Data Talks Club cohort 2024.

Introduction

In today's competitive financial landscape, efficient loan approval processes are crucial. This project aims to develop and deploy a Machine Learning (ML) model to predict loan eligibility. By leveraging MLOps practices, we will build a robust and automated system for loan assessment. This will enable faster loan decisions, improve customer experience, and optimize risk management for the financial institution.

Problem Statement: Loan Eligibility Through Traditional vs. Machine Learning Approach

Traditional Loan Approval Process:

Currently, loan eligibility decisions are primarily made through human underwriters who assess various borrower data points like income, credit score, employment history, debt-to-income ratio, and collateral. This manual process can be time-consuming, prone to bias, and lack consistency, leading to potential delays and dissatisfied customers. Additionally, it can be challenging to accurately assess the creditworthiness of non-traditional borrowers who may need a more extensive credit history.

Machine Learning Approach:

This project proposes a Machine Learning (ML) model to automate and enhance the loan eligibility prediction process. The model will learn from historical loan data, identifying patterns differentiating approved and rejected loan applications. This data-driven approach can lead to:

  • Faster Approvals: Automated predictions can significantly reduce processing time, allowing quicker loan decisions.
  • Reduced Bias: ML models are objective and unbiased, mitigating the risk of human judgment influencing loan decisions.
  • Improved Efficiency: Streamlined loan assessment frees up underwriters' time for more complex cases.
  • Enhanced Risk Management: The model can identify risk factors and predict potential defaults, allowing lenders to make informed decisions.

Technologies:

  • Machine Learning: Scikit-learn
  • Experiment tracking and model registry: CometML
  • Cloud Infraestructure: Docker, Terraform, AWS (EC2 and S3)
  • Linting and Formatting: Pylint, Flake8, autopep8
  • Testing: Pytest
  • Automation: GitHub Actions (CI/CD Pipeline)
  • Orchestration: Prefect

Complete ML Project Process:

Let's check the complete directory of the project.

  1. Data Ingestion:

  2. Exploratory Data Analysis (EDA):

    • Data visualizations will be used to understand the distribution of loan features, identify potential correlations, and uncover any hidden patterns.
    • Feature importance analysis will assess the influence of each factor on loan eligibility.
  3. Feature Engineering:

    • New features were created based on existing data to improve model performance.
    • Data scaling was applied to ensure all features are on a similar scale.
  4. Feature Selection:

  5. Model Training and Selection:

    • Various ML algorithms, such as Logistic Regression, Random Forest, or Gradient Boosting, will be trained and evaluated on some of the data.
    • Model selection will be based on accuracy, precision, and recall metrics.
    • Prefect orchestrated the workflow with the following pipeline. Note: Let's provide the API KEY to use Prefect. You can check the Quickstart guide.
    pip install -U prefect --pre
    prefect cloud login -k '<my-api-key>'
    
    • Data ingestion
    • Feature engineering
    • Feature selection
    • Training
    • Model registry

    Prefect orquestation

  6. Experiment Tracking and Version Control:

    • Comet ML was used to track the experiment.
    • You need to set up an API_KEY to use the package in the project.
    • You can check the official Comet documentation.
     ```
      pip install comet_ml
      comet login
    ```
    
    • It is an example here for you to include in your project.
     ```
     # Get started in a few lines of code
     import comet_ml
     comet_ml.login()
     exp = comet_ml.Experiment()
     # Start logging your data with:
     exp.log_parameters({"batch_size": 128})
     exp.log_metrics({"accuracy": 0.82, "loss": 0.012})```
    

    Experiment Tracking

  7. Model registry and Version Control:

    • The models were registered and versioned using CometML.
    • Registering and promoting models through various stages is essential for ensuring the quality and reliability of your machine learning solutions. Model registry
  8. Model Testing:

    • The scripts were assessed using Pylint and flake8 and were formatted using autopep8.

      Linting and Formatting

    • The model with the best performance was deployed using a Flask application.

    • At the beginning, it was tested on the host machine.

    Local host

  9. Model Deployment:

    • Once the application was tested locally, the Makefile was created to containerize the app.

    To build the image:

    Make build
    

    To push the image to the docker hub repo:

    Make push
    

    To run the image locally or on the cloud:

    Make run
    

    Note: provide docker credentials on the terminal to pull the docker image. Let's check the image on your docker hub repo: Docker hub repo

    • The production-ready model was deployed on AWS infrastructure (EC2 and S3). Using Terraform as IAC to manage computational resources. From the [app directory] (src/deployment), run:

      terraform init
      terraform plan
      terraform apply
      

      Executing these commands will perform the following activities:

      • Provide AWS infrastructure
      • Enable TCP traffic (HTTP and HTTPS)
      • Install and enable docker on the EC2 instance
      • Pull the docker image from my docker hub repo
      • Run the image on the EC2 instance
      • Print the public IP address of the Flask app.
    • GitHub Actions automates the CI/CD pipeline. The pipeline has the following steps:

      • Checkout repository
      • Set up Python
      • Set up Terraform
      • Initialize Terraform
      • Init, plan, and apply terraform tasks.
      • Print the public IP address of the Flask app.

    Github Actions

    • The Flask app is deployed automatically on the AWS cloud:

    AWS Flask

    Note: The app will be available until the Attempt 2 review is completed.

    Link Loan Eligibility Flask AWS

  10. Monitoring and Continuous Improvement: - The deployed model's performance will be continuously monitored through key metrics. - Periodic retraining with new data will be conducted to ensure the model stays accurate and adapts to changing market conditions.

By implementing this data-driven approach, the project aims to significantly improve loan eligibility assessment, leading to faster decisions, enhanced customer satisfaction, and optimized risk management for the financial institution.

About

Project as part of MLOps Zoomcamp cohort 2024.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published