Skip to content

Commit

Permalink
Merge pull request #5 from janvandermeulen/a1docs
Browse files Browse the repository at this point in the history
A1docs
  • Loading branch information
MichaelChan20 authored May 6, 2024
2 parents 0c48d6b + b299ec5 commit 038ac67
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 2 deletions.
16 changes: 16 additions & 0 deletions Activity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# A1:
-- Shayan Ramezani --
Created PR: https://github.com/janvandermeulen/REMLA-group10/pull/2
Approved PR: https://github.com/janvandermeulen/REMLA-group10/pull/1

-- Michael Chan --
Created PR: https://github.com/janvandermeulen/REMLA-group10/pull/3
Approved PR: https://github.com/janvandermeulen/REMLA-group10/pull/4

-- Remi Lejeune --
Created PR: https://github.com/janvandermeulen/REMLA-group10/pull/4
Approved PR: https://github.com/janvandermeulen/REMLA-group10/pull/3

-- Jan van der Meulen --
Created PR: https://github.com/janvandermeulen/REMLA-group10/pull/1
Approved PR: https://github.com/janvandermeulen/REMLA-group10/pull/2
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,25 @@ In this assignment we will be transferring a small kaggle model to a professiona
- poetry for dependency management.

### How to run
To run this codee you need to have poetry installed. You can install the packages by running the following commands:
To run this code you need to have poetry installed.
You can install the packages by running the following commands:
(this should be executed in the phishing-detection folder)

- if the lock file is out of date:
- ```poetry lock --no-update```

- ```poetry install```
- ```poetry shell```

To retrieve the data and run the pipeline:
- ```dvc pull```
(this should be executed in the remla-group10 folder)
- ```dvc fetch```
- ```dvc pull``` (may not work)
- ```dvc repro```

To run the code quality metrics:
(this should be executed in the phishing-detection folder)
- ```pylint ./phishing_detection```
- ```bandit ./ -r```

The project will be restructured in the future such that there is a single root folder from which all scripts can be executed from.
35 changes: 35 additions & 0 deletions documents/A1Submission.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
operation: https://github.com/janvandermeulen/REMLA-group10

## Comments for A1
### Task 1: Organise your training pipeline following machine learning project best practices.
Pull request: https://github.com/janvandermeulen/REMLA-group10/pull/1 and https://github.com/janvandermeulen/REMLA-group10/pull/2
Contributors: Shayan Ramezani and Jan van der Meulen
Reviewers: Jan van der Meulen, Shayan Ramezani, Michael Chan, and Remi Lejeune.

We chose poetry to handle all the packages. Instructions to set-up the project are added in the README. The codebase was written such that DVC can do a step-by-ste- reproduction.

### Task 2: Enable collaborative development through a pipeline management tool (DVC)
Pull request: https://github.com/janvandermeulen/REMLA-group10/pull/4
Contributor: Remi Lejeune
Reviewer: Michael Chan

We uploaded the data to the a remote gdrive cloud bucket using dvc. The data is now versioned and can be accessed by all team members.
Furthermore, we created a reproduction pipeline. We have encountered some issues with DVC pull and it may not pull from cloud, in which case run dvc repro to reproduce the files.

### Task 3: Report metrics using DVC
Pull request: https://github.com/janvandermeulen/REMLA-group10/pull/4
See description of previous task.

### Task 4: Audit code quality
Pull request: https://github.com/janvandermeulen/REMLA-group10/pull/3
Contributor: Michael Chan
Reviewer: Jan van der Meulen, Remi Lejeune and Shayan Ramezani

We used pylint and bandit to audit the code quality. The README provides instructions on how to run these tools. We fixed all the errors that the tools showed. Explanation for some of the configuration settings for both pylint and bandit:

A regex was created to accept names with a single capital letter between "_" as those are common names for matrix variables in data science, example of accepted names by regex: X_train, raw_X_train and X.
TODO warnings have been suppressed temporarily. As this is still the first version there are still many things that could be improved that have been
tagged as TODO for now, this should not affect the code quality.
The number of arguments and local variables allowed has been increased as it is common in data science to separate data such as train and test in separate local variables, this results in relatively more variables used.
Bandit warning B106 about potential hardcoded access tokens has been suppressed as it falsely triggers on the usage of the word token which is prevalent in data science projects and has nothing to do with password/auth tokens.

0 comments on commit 038ac67

Please sign in to comment.