GitHub - toledkrw/Aula-DataScience-Trabalho1: An Academic work where I extract steam community market data to plot on a dashboard later

Data Science

First Evaluation Activity

🔰 Getting Started

This project was created to perform a data extraction process from the Steam WEB API in Python.

💾 Tools Used

🤖 Technologies used

📋 Prerequisites

🐍Python

💡Attention

There is a requirements.txt file, where all dependencies are listed.

Just run the install_requirements.bat (if on windows) or install_requirements.sh (if on linux) script to install the dependencies listed in that file.

🎨 Features

The application has the following functionality:

🛠️ Search and Extract Data

By running the process with the -e flag you run the program in a way that searches for data on the Steam market through the WEB API. It will be necessary to provide an AppID, identified by the -a flag and, optionally, a search string, if you want to search for specific items, identified by the -q flag.

The usage would be basically: python -u main.py -e -a XXX -q SSSS

Example:

python -u main.py -e -a 730 -q AK-47
python -u main.py -e -a 570

The stored data will follow this structure, since its a RAW extraction.

If main.py is executed within the project folder, manually, the data will be stored in the data folder partitioned by AppID.

🧶Data Pre Processing

By running the process with the -p flag you run the program in a way it pre-process and enriches previous extracted data.

Example:

python -u main.py -p

It will, currently, only enrich the data with a timestamp.

💎 Data Refinement

By running the process with the -r flag you run the program in a way it refines the previous pre-processed data. The process with rename and drop some data fields, but will still store them as json batches for each planned table.

Example:

python -u main.py -e

📑 Licenses

Distributed under the MIT License. See LICENSE for more information.

🧻 TODOs

Add functionality to save data from the project folder, not from the execution environment
Add optional parameterization for pagination size of the search module (default is set to 100)
Add TRUSTED layer process
Add REFINED layer process
Add STORED layer process

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.docker		.docker
.requirements		.requirements
.test/utils		.test/utils
.vscode		.vscode
DataExtractionProcess		DataExtractionProcess
DataPreProcessing/handlers		DataPreProcessing/handlers
DataRefinementProcess/handlers		DataRefinementProcess/handlers
DataStoringProcess/handlers		DataStoringProcess/handlers
doc/dimensionalModeling		doc/dimensionalModeling
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
main.py		main.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science

🔰 Getting Started

💾 Tools Used

🤖 Technologies used

📋 Prerequisites

🎨 Features

🛠️ Search and Extract Data

Example:

🧶Data Pre Processing

Example:

💎 Data Refinement

Example:

📑 Licenses

🧻 TODOs

About

Releases

Packages

Languages

License

toledkrw/Aula-DataScience-Trabalho1

Folders and files

Latest commit

History

Repository files navigation

Data Science

🔰 Getting Started

💾 Tools Used

🤖 Technologies used

📋 Prerequisites

🎨 Features

🛠️ Search and Extract Data

Example:

🧶Data Pre Processing

Example:

💎 Data Refinement

Example:

📑 Licenses

🧻 TODOs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages