-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathinstructions.txt
57 lines (40 loc) · 2.6 KB
/
instructions.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
instructions:
1- Check whether Python is installed on your system by running the command python3 --version.
If it is not installed, follow the instructions provided in the
official Python documentation at https://www.python.org/downloads/ to install Python.
2- Once Python is installed on your system, create a virtual environment to isolate the project's dependencies.
You can create a virtual environment using the following command: `python -m venv .venv`
Check the documentation for your operating system to see if you need to install any additional packages.
3- Activate the virtual environment using the following command: `source .venv/bin/activate`
Check the documentation for your operating system to see if the command is different.
4- Install the required packages using the following command: `pip install -r requirements.txt`
5- Launch Jupyter Notebook by running the command jupyter-notebook.
6- Navigate to main-script.ipynb and run the code using the "Run All" button.
Make sure that the virtual environment is activated while running the notebook.
I. Introduction
Background on the importance of handling missing data in datasets and test them on using linear regression
Brief overview of the methods used to fill null values
II. Objectives
To compare the performance of interpolation and least square approximation in filling null values
To assess the impact of the filling methods on the performance of a regression model
III. Methodology
A. Data Collection and Preparation
Load fetch_california_housing dataset from sklearn
Add randomly generated null values to the dataset
B. Null Value Filling
Interpolate null values using a linear interpolation method
Use least square approximation with Pseudo Inverse to fill null values
C. Data Preprocessing
Standardize the dataset
D. Regression Model
Run a regression model on the original dataset
Run a regression model on the interpolated dataset
Run a regression model on the dataset filled with least square approximation
E. Performance Metrics
Compute metrics to assess the performance of the regression models for each dataset, including R-squared, mean squared error, and mean absolute error.
IV. Results
Present the results of the regression models and the performance metrics computed for each dataset
V. Conclusion
Summarize the findings of the study and draw conclusions about the effectiveness of the methods used to fill null values in the dataset
VI. Future Work
Discuss potential future research directions, such as exploring other methods for filling null values or examining the impact of different preprocessing techniques on the performance of regression models.