No project description provided
Project description
Median housing value prediction
The housing data can be downloaded from https://raw.githubusercontent.com/ageron/handson-ml/master/. The script has codes to download the data. We have modelled the median house value on given housing data.
The following techniques have been used:
- Linear regression
- Decision Tree
- Random Forest
Steps performed
-
Data Preparation: The housing data is cleaned and prepared. Missing values are checked and imputed.
-
Feature Engineering: Features are generated, and variables are checked for correlation.
-
Modeling: Multiple sampling techniques are evaluated. The dataset is split into training and testing sets. Various modeling techniques, including linear regression, decision trees, and random forest, are tried and evaluated. The mean squared error (MSE) is used as the evaluation metric.
How to Run the Code
Follow these steps to run the code:
Environment Setup
-
Create Environment:
bashconda env create -f env.yml -
Activate Environment:
bashconda activate mle-dev -
To excute the script:
python3 nonstandardcode.py -
Flake 8 command:
flake8 nonstandardcode.py
How to Run the Code
Follow these steps to run the code:
-
create the environment
conda env create --name mle-dev --file env.yml -
Activate the Conda environment:
bashconda activate mle-dev -
Install dependencies using the environment configuration file:
conda env create -f env.yml
or manually install the dependencies
conda install numpy pandas matplotlib scikit-learn
Execute the Scripts
Run the following scripts sequentially:
````python ingest_data.py python train.py train.csv```
```python score.py test.csv```
Run Tests
Run the unit tests and functional tests:
python test_ingest_data.py
python test_train.py train.csv
python test_score.py test.csv
pytest
Code Formatting
Use black, isort, and flake8 for code formatting:
Fix some errors by adding this to black:
black --fast nonstandardcode.py #remove extra lines from the code
Fix some errors by adding this to isort:
isort --float-to-top nonstandardcode.py #gets all the imports to the top of the file
Fix some errors by adding this to flake8:
flake8 --ignore=F401 --max-line-length=120
nonstandardcode.py F401 are imports that are not required and increase max line to 120 words
Generating Data and Models
To generate training, testing, housing_prepared, and housing_labels files, run:
bash
python ingest_data.py --log-level DEBUG --log-path ../logs/ingest_data.log
--log-level DEBUG - USED FOR DEBUG MODE FOR STORING THE LOGS --log--path - USED FOR STORING THE LOG DATA IN THE FILE SPECIFIED --no-console-log -NOT TO PRINT THE LOG IN CONSOLE THEN THE NEXT COMMAND LINE ARGUMENT IS THE 4 FILES THAT ARE GOING TO BE GENRATED
To train machine learning models and generate pickle files, run:
Now to run the second train.py file so as to generate the pickle files for the data they are not uploaded since they are very large and github has limitations of 100mb
python train.py --file-name train.csv --log-path ../logs/train.log --log-level DEBUG--log-level
DEBUG - USED FOR DEBUG MODE FOR STORING THE LOGS
--log--path - USED FOR STORING THE LOG DATA IN THE FILE SPECIFIED
--no-console-log -NOT TO PRINT THE LOG IN CONSOLE
THEN THE NEXT COMMAND LINE ARGUMENT IS THE 4 FILES THAT ARE GOING TO BE GENRATED
To score all four models based on RMSE values, run:
Now to score all the four models based on rmse values
python score.py --file-name test.csv --log-path ../logs/score.log --log-level
DEBUG--log-level DEBUG - USED FOR DEBUG MODE FOR STORING THE LOGS
--log--path - USED FOR STORING THE LOG DATA IN THE FILE SPECIFIED
--no-console-log -NOT TO PRINT THE LOG IN CONSOLE
the answer will be printed in the console
To install and configure the setup.py files we use the following line of code this will make all three of our files into packages and we can use them in testing
pip install --upgrade setuptools
pip install --upgrade build
python -m build
give all the dependencies in the setup.py files
Documentation Generation
To generate Sphinx documentation: Then to run functional test to see if all the files has been created correctly python -m functional_test
To generate the sphinx files for documentation
pip install sphinx
sphinx-quickstart
sphinx-build -M html source source
sphinx-apidoc -o source .\src\house_price_prediction\
make html
After running sphinx-quickstart, make necessary changes in the conf.py file and add modules inside the index.rst file, then run the last two lines.
This README provides detailed instructions for setting up the environment, running the code, formatting the code, generating data and models, installing and configuring packages, and generating documentation. Adjust paths and commands as needed based on your project setup.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file house_price_prediction_6112-0.1.tar.gz.
File metadata
- Download URL: house_price_prediction_6112-0.1.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df40396ae8e4c4032992143ceb14ea05b2d464a54487eda9a9d878f1bb6cde4
|
|
| MD5 |
349018f591bb8d53a0ec3bf06a9fcde5
|
|
| BLAKE2b-256 |
227fecc3c675742b1cd899d3c24b3d484e282e8bd6015c72c5c9ffbd7694ebfe
|
File details
Details for the file house_price_prediction_6112-0.1-py3-none-any.whl.
File metadata
- Download URL: house_price_prediction_6112-0.1-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce9cc0c940d91d50fcb10c07c31ec43d84692edb44c298a7d34a0f3339bca008
|
|
| MD5 |
8464af40700cf962f5672533566b56ff
|
|
| BLAKE2b-256 |
5e7ad79c5dce1d390a8effba76dfc2b38b652b63476c9935cfcf22b08dbcbf0f
|