Loan Eligibility Prediction

This project is part of the MLOps Zoomcamp offered by Data Talks Club cohort 2024.

Introduction

In today's competitive financial landscape, efficient loan approval processes are crucial. This project aims to develop and deploy a Machine Learning (ML) model to predict loan eligibility. By leveraging MLOps practices, we will build a robust and automated system for loan assessment. This will enable faster loan decisions, improve customer experience, and optimize risk management for the financial institution.

Problem Statement: Loan Eligibility Through Traditional vs. Machine Learning Approach

Traditional Loan Approval Process:

Currently, loan eligibility decisions are primarily made through human underwriters who assess various borrower data points like income, credit score, employment history, debt-to-income ratio, and collateral. This manual process can be time-consuming, prone to bias, and lack consistency, leading to potential delays and dissatisfied customers. Additionally, it can be challenging to accurately assess the creditworthiness of non-traditional borrowers who may need a more extensive credit history.

Machine Learning Approach:

This project proposes a Machine Learning (ML) model to automate and enhance the loan eligibility prediction process. The model will learn from historical loan data, identifying patterns differentiating approved and rejected loan applications. This data-driven approach can lead to:

Faster Approvals: Automated predictions can significantly reduce processing time, allowing quicker loan decisions.
Reduced Bias: ML models are objective and unbiased, mitigating the risk of human judgment influencing loan decisions.
Improved Efficiency: Streamlined loan assessment frees up underwriters' time for more complex cases.
Enhanced Risk Management: The model can identify risk factors and predict potential defaults, allowing lenders to make informed decisions.

Technologies:

Machine Learning: Scikit-learn
Experiment tracking and model registry: CometML
Cloud Infraestructure: Docker, Terraform, AWS (EC2 and S3)
Linting and Formatting: Pylint, Flake8, autopep8
Testing: Pytest
Automation: GitHub Actions (CI/CD Pipeline)
Orchestration: Prefect

Complete ML Project Process:

Let's check the complete directory of the project.

Data Ingestion:
- The data was extracted from the Kaggle Loan Eligibility Dataset.
- Let's check the raw data
- Data cleaning procedures will ensure data quality and address missing values or inconsistencies.
Exploratory Data Analysis (EDA):
- Data visualizations will be used to understand the distribution of loan features, identify potential correlations, and uncover any hidden patterns.
- Feature importance analysis will assess the influence of each factor on loan eligibility.
Feature Engineering:
- New features were created based on existing data to improve model performance.
- Data scaling was applied to ensure all features are on a similar scale.
Feature Selection:
- SelectFromModel method was applied to select the more relevant features.
Model Training and Selection:
- Various ML algorithms, such as Logistic Regression, Random Forest, or Gradient Boosting, will be trained and evaluated on some of the data.
- Model selection will be based on accuracy, precision, and recall metrics.
- Prefect orchestrated the workflow with the following pipeline. Note: Let's provide the API KEY to use Prefect. You can check the Quickstart guide.
```
pip install -U prefect --pre
prefect cloud login -k '<my-api-key>'
```
- Data ingestion
- Feature engineering
- Feature selection
- Training
- Model registry

Experiment Tracking and Version Control:

Comet ML was used to track the experiment.
You need to set up an API_KEY to use the package in the project.
You can check the official Comet documentation.

 ```
  pip install comet_ml
  comet login
```

It is an example here for you to include in your project.

 ```
 # Get started in a few lines of code
 import comet_ml
 comet_ml.login()
 exp = comet_ml.Experiment()
 # Start logging your data with:
 exp.log_parameters({"batch_size": 128})
 exp.log_metrics({"accuracy": 0.82, "loss": 0.012})```

Model registry and Version Control:
- The models were registered and versioned using CometML.
- Registering and promoting models through various stages is essential for ensuring the quality and reliability of your machine learning solutions.
Model Testing:
- The scripts were assessed using Pylint and flake8 and were formatted using autopep8.
- The model with the best performance was deployed using a Flask application.
- At the beginning, it was tested on the host machine.
Model Deployment:
- Once the application was tested locally, the Makefile was created to containerize the app.
To build the image:
```
Make build
```
To push the image to the docker hub repo:
```
Make push
```
To run the image locally or on the cloud:
```
Make run
```
Note: provide docker credentials on the terminal to pull the docker image. Let's check the image on your docker hub repo:
- The production-ready model was deployed on AWS infrastructure (EC2 and S3). Using Terraform as IAC to manage computational resources. From the [app directory] (src/deployment), run:
```
terraform init
terraform plan
terraform apply
```
  Executing these commands will perform the following activities:
  - Provide AWS infrastructure
  - Enable TCP traffic (HTTP and HTTPS)
  - Install and enable docker on the EC2 instance
  - Pull the docker image from my docker hub repo
  - Run the image on the EC2 instance
  - Print the public IP address of the Flask app.
- GitHub Actions automates the CI/CD pipeline. The pipeline has the following steps:
  - Checkout repository
  - Set up Python
  - Set up Terraform
  - Initialize Terraform
  - Init, plan, and apply terraform tasks.
  - Print the public IP address of the Flask app.
- The Flask app is deployed automatically on the AWS cloud:
Note: The app will be available until the Attempt 2 review is completed.

Link Loan Eligibility Flask AWS
Monitoring and Continuous Improvement: - The deployed model's performance will be continuously monitored through key metrics. - Periodic retraining with new data will be conducted to ensure the model stays accurate and adapts to changing market conditions.

By implementing this data-driven approach, the project aims to significantly improve loan eligibility assessment, leading to faster decisions, enhanced customer satisfaction, and optimized risk management for the financial institution.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
data		data
images		images
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
directory.txt		directory.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan Eligibility Prediction

Introduction

Problem Statement: Loan Eligibility Through Traditional vs. Machine Learning Approach

Technologies:

Complete ML Project Process:

About

Releases

Packages

Languages

License

beotavalo/loan-elegibility-prediction

Folders and files

Latest commit

History

Repository files navigation

Loan Eligibility Prediction

Introduction

Problem Statement: Loan Eligibility Through Traditional vs. Machine Learning Approach

Technologies:

Complete ML Project Process:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages