Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementação funcional de log de todo o processamento embutido no próprio dataframe #27

Merged
merged 188 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
188 commits
Select commit Hold shift + click to select a range
5fb4916
:monocle_face: Blocos de Frequências mapeado com sucesso
ronaldokun Aug 1, 2023
bc68044
Finalizada extração dos dados do SMP
ronaldokun Aug 22, 2023
10dc178
Atualizando a configuração dos workflows do github
ronaldokun Aug 22, 2023
2e751a3
Ran ruff linter
ronaldokun Aug 22, 2023
87dafdc
Finalizada exploração de dados do SMP
ronaldokun Aug 25, 2023
b28a373
Changed deploy.yml from fastai to use one from quarto actions
ronaldokun Aug 25, 2023
fa06b0d
Corrigido tratamento de erro para #Estação não numérico
ronaldokun Sep 6, 2023
bc01e2a
:bug: fixed bug in updates.py:validar_coords
ronaldokun Sep 12, 2023
dd62f05
:notebook: Debugged Rendering of Documentation
ronaldokun Sep 28, 2023
5afcfed
:bug: Revertendo deploy.yaml para o nativo do repo nbdev
ronaldokun Sep 28, 2023
8d20a66
:bug: Removed black as dev_requirement because is blocking quarto wor…
ronaldokun Sep 28, 2023
6e2a824
:bug: Removed dev_requirements because is blocking quarto workflow
ronaldokun Sep 28, 2023
79cdb43
:recycle: Migrada conexões de Banco e Lógica de Dados para Classes
Oct 4, 2023
33dc14c
:sparkles: Reimplementação do módulo updates.py
ronaldokun Oct 8, 2023
ad5ad3d
:recycle: Terminada validação da extração das bases da Anatel
ronaldokun Nov 1, 2023
9df985a
:ambulance: Correção de bug retornando frequências nulas
ronaldokun Nov 6, 2023
7a4184b
Remoçã de Frequências NaN geradas no Uplink do SMP
Nov 6, 2023
ca762ad
Corrigido bug the processamento na função _format dos dados de telecom
Nov 6, 2023
6eba1a8
Working in progress
Nov 7, 2023
660ddd7
:rewind: Eliminated keep discarded rows because of memory
ronaldokun Nov 7, 2023
072824c
Fixed bug in telecom.py
Nov 7, 2023
b23830a
:sparkles: Added local poligon validation of coordinates
ronaldokun Nov 9, 2023
f29bccd
Atualizando repo para testar validação de coordenadas
Nov 9, 2023
0521071
:bug: Corrected fixed file paths in icao.py
Nov 9, 2023
f3cb8b0
Validação de Coordenadas local funcional
Nov 9, 2023
fe306b9
Added more filters when querying the MongoDB
Nov 9, 2023
8205a6a
Tidying up the merged data
Nov 9, 2023
864ac54
:bug: Fixed bug in _format in telecom.py
Nov 10, 2023
93c4f08
Fixing the column names in aeronautica.py
Nov 10, 2023
8e149c4
Testing the full workflow. Work in progress
Nov 10, 2023
56a7d34
Commited profiling files by mistake
Nov 10, 2023
9a4d050
Finalizada lógica de extração e processamento
Nov 13, 2023
097aa98
Debugging merge_on_frequency
Nov 13, 2023
efb02da
:rocket: Nova versão totalmente refatorada com SMP/Log/Validação
ronaldokun Nov 15, 2023
d4b0537
Merge branch 'dev' of github.com:InovaFiscaliza/EmissorasRF into dev
ronaldokun Nov 15, 2023
bc419ab
:fire: Cleaning repo to release new version
ronaldokun Nov 15, 2023
84caba8
Testing extraction and inspecting data
Nov 15, 2023
cbd5dd9
Finalizada extração de dados de propagação do Mosaico - SRD
Nov 16, 2023
dfbd7d2
Finalizada extração de dados de propagação do Mosaico - SRD
Nov 16, 2023
dfbad83
Formatação da coluna de Frequências
Nov 16, 2023
b5c1355
Adicionado log referente ao processamento da coluna Designação_Emissão
Nov 16, 2023
8c6c10b
Efetuando testes para conversão de tipos do dataframe
Nov 17, 2023
7fec660
Remoção de todos os pontos de casting, dropping e sorting espalhados …
Nov 17, 2023
afae4c6
Definindo tipagem específica para cada coluna no conjunto final
Nov 17, 2023
3ca9ff5
Remoção de bug na função merge_on_frequency
Nov 20, 2023
de87671
Fixing merge conflict
Nov 20, 2023
e475eab
Test after debug of merge_on_frequency
Nov 20, 2023
ea29b33
Still testing extraction after debug
Nov 20, 2023
956c9cd
♻️ Added Error Handling when failing to update
Nov 27, 2023
dda0d50
🐛 Reversed df method in Base class to be a cached property
Nov 27, 2023
a0510aa
Forgot to change back df() to df in _update_source
Nov 27, 2023
3a17c2f
🐛 Recreated variables left_cols and right_cols
Nov 27, 2023
3e82163
🚀 Tested and Deployed new refactored version
Nov 27, 2023
57fe2e9
Merge branch 'dev' of github.com:InovaFiscaliza/EmissorasRF into dev
Nov 27, 2023
4de44bd
🐛 I inadvertenly reverted back the bug fixed in merge_on_frequency
Nov 28, 2023
00920a9
🎨 Improve OOP Logic, DRYyed the codebase
Nov 30, 2023
039017a
🎨 Improve OOP Logic, DRYyed the codebase
Nov 30, 2023
d3127ab
Updated Version Extracted
Dec 1, 2023
22c372e
🥅 Added ValueError in the property df when failed to read from file.
Jan 12, 2024
27ada14
🔥 Remove code related to casting types in class Mosaico.
Jan 12, 2024
972bf7b
Eliminated using inplace in drop_duplicates since it creates a copy i…
Jan 12, 2024
9988fbc
⏪ Reverted saving df as numpy_nullable. pyarrow doesn't handle catego…
Jan 12, 2024
057e77b
Simplified nomenclature
Jan 12, 2024
ea9dbe4
🔧 Updating VersionFile and settings.ini
Feb 8, 2024
c060a33
🐛 numpy_nullable is not an engine when saving the parquet
Feb 8, 2024
0da1b86
🐛 Removed Arrow Exception as callable and replace by generic Exceptio…
Feb 8, 2024
9bdb5cc
♻️ Minor refactor and removed unised imports
Feb 8, 2024
f0d4634
Removed unused imports
Feb 8, 2024
d0b3cdd
Removed unused imports
Feb 8, 2024
05664f3
⏪ Removed pyarrow typing
Feb 8, 2024
618eb7d
💡 Added docstrings
Feb 8, 2024
3aa8e37
Updated docstrings
Feb 8, 2024
f95359e
💡 Updated and kept in English all docstrings
ronaldokun Feb 9, 2024
6d3e105
🗑️Removed unused and duplicated code
ronaldokun Feb 9, 2024
42242b8
🔧 Updated the conda environment file
ronaldokun Feb 9, 2024
ed419c6
🐛 Removed some typing bugs
ronaldokun Feb 9, 2024
30c3165
🔥 Removed obsolete .md files
ronaldokun Feb 9, 2024
27eb37c
🚧 Refactoring validation of coordinates and city code to separate mod…
ronaldokun Feb 9, 2024
a718a29
🎨Refactored location logic to separate module
ronaldokun Feb 15, 2024
a64693d
🚧 Creating varios fillna methods for coordinates
ronaldokun Feb 15, 2024
7fbc3e0
🐛 Turning literal garbage value from Mosaico [] into pd.NA
ronaldokun Feb 16, 2024
ff009e5
🔧 Added conda environment for windows
Feb 19, 2024
17eeaf8
✨ Added call to save raw extracted data before formatting
Feb 19, 2024
3811d2c
🐛 Identified bug when replacing coordinates
ronaldokun Feb 19, 2024
acf1fb6
♻️ Added flag copy=false when casting types
ronaldokun Feb 19, 2024
ddb7897
🎨 Used module Geografy from location.py to substitute coordinates
ronaldokun Feb 19, 2024
c189550
Merge branch 'dev' of github.com:InovaFiscaliza/EmissorasRF into dev
Feb 19, 2024
2fa4d26
⚰️ Changed estacoes.py to stations.py
ronaldokun Feb 19, 2024
8447a1d
🐛 Added casting to string in `Designação_Emissão` before processing
ronaldokun Feb 19, 2024
b6565f8
Merge branch 'dev' of github.com:InovaFiscaliza/EmissorasRF into dev
Feb 19, 2024
8bd52b9
🐛 Casting to string again after exploding the column `Designação_Emis…
Feb 19, 2024
2e82b5e
🐛 Trying to debug split_designacao from Mosaico class
ronaldokun Feb 19, 2024
92b4d56
🐛 Fillna to -1 before casting column `Estacao` to int in exclude_dupl…
ronaldokun Feb 19, 2024
1d3605c
🐛 Debugging validate_channels
ronaldokun Feb 19, 2024
241592a
🚧 Cleaning Notebooks to test extraction
ronaldokun Feb 19, 2024
bd476c1
🐛 Introduced a major bug when removed `$` by mistake on regex call in…
Feb 20, 2024
8fa5f33
🎨 Update call from row[column] to row.loc[column] from df.apply call …
Feb 20, 2024
64846d8
⚰️ Removed redundant casting of columns to strings. Now all columns b…
Feb 20, 2024
29f6b03
🐛 Cast df to string before calling general regex expression
Feb 20, 2024
f0e8450
🐛 Eliminate naming of placeholder in format string
Feb 20, 2024
59f8d63
⚡️ Added `read_cache` flag when instantiating the class
ronaldokun Feb 20, 2024
91061ae
Merge branch 'dev' of https://github.com/InovaFiscaliza/rfdatahub int…
ronaldokun Feb 20, 2024
1566c6f
⚡️ Added read_cache option
ronaldokun Feb 20, 2024
bd1fc29
🔥 Removed files no longer on index
ronaldokun Feb 20, 2024
6628128
🗃️ Removed Nome do Município como campo extraído do MongoDB
ronaldokun Feb 20, 2024
3ae2c4a
✨ Added read_cache flag to Aero, but this cache is the already proces…
ronaldokun Feb 20, 2024
33063da
✨ Integrated read_cache on Estacoes and overloaded method update
ronaldokun Feb 20, 2024
fad87f8
⏪ Put back column Município in the extraction given too many errors i…
Feb 20, 2024
39b3569
⏪ Extracted again the Municipio column
Feb 20, 2024
adbce2f
🚧 Testing full extraction
Feb 20, 2024
b105b2b
🧐 Remoção de lixo nas strings de Código do Município.
ronaldokun Feb 23, 2024
e39572e
✨ Added various methods for correction of bad location data
ronaldokun Feb 24, 2024
b407144
🐛 Cast couting column to int
ronaldokun Feb 24, 2024
48b74cc
🐛 Dictionary keys not changed in all places
ronaldokun Feb 24, 2024
e51a303
🐛 Fixed the dictionary keys in logging
ronaldokun Feb 24, 2024
8d83ee8
Minor casting types
ronaldokun Feb 24, 2024
4d2dd0d
♻️ added 'Log' init to _extract
ronaldokun Feb 26, 2024
04f211f
🐛 Returned to explicit typing on new columns instead of whole df
ronaldokun Feb 26, 2024
9baa318
Enabled lazy copy_on_write on pandas
ronaldokun Feb 26, 2024
410228e
✨ Updated file municipios.csv with proper names, and abbreviation of …
ronaldokun Feb 26, 2024
a6c5f3c
✨ Adicionado localização via API geopy para dados ausentes.
ronaldokun Feb 26, 2024
25cdda2
✨ Implementada formatação de log como json em coluna específica.
ronaldokun Feb 28, 2024
1f724ee
🎨 Removed returned object since it's supposed to be altered by refere…
ronaldokun Feb 28, 2024
91dd48c
🎨 Transformed split_designacao into staticmethod
ronaldokun Feb 28, 2024
e438277
🎨 Finished adapting logging calls
ronaldokun Feb 28, 2024
6750a78
⚰️ Commented code that needs to be rewriten
ronaldokun Feb 28, 2024
40ae36a
✨ Generalized logging method
ronaldokun Feb 29, 2024
4ccdc56
🎨 Adjusted logging to newer method
ronaldokun Feb 29, 2024
8903571
✨ Added logging in _format from SRD
ronaldokun Feb 29, 2024
7e80d8d
♻️ Remove small datasets call to progress_apply. Injected pyarrow aga…
ronaldokun Mar 1, 2024
09c676e
♻️ Removed casting types inside base favoring explicit casting in eac…
ronaldokun Mar 1, 2024
478ade4
🐛 Fixed bug in df.explode
ronaldokun Mar 1, 2024
e3480ef
✨ Finished logging in _format from Telecom
ronaldokun Mar 1, 2024
a7b3db2
⚡️Retornada a leitura usando pyarrow para performance
ronaldokun Mar 6, 2024
4080d02
🔊 Added logging of Radcom
ronaldokun Mar 6, 2024
d269134
🔊 Descriminado melhor o log do radcom
ronaldokun Mar 6, 2024
d42fb51
🔊 Melhoria no log de Radcom
ronaldokun Mar 6, 2024
2a27c56
🔊 Simploficado log do Mosaico e SRD
ronaldokun Mar 6, 2024
b6a7cb2
🔊 Simplified logging when the processing is done on whole row
ronaldokun Mar 6, 2024
8c2b5da
✨ Adicionado Log do conjunto de estações mescladas.
ronaldokun Mar 7, 2024
fe00594
♻️ Saving all outputs to 'category'
ronaldokun Mar 12, 2024
076a344
♻️ Added common logic of exclude_duplicated to base class Mosaico
ronaldokun Mar 12, 2024
6b0ed49
🐛 forgot to do an assignment
ronaldokun Mar 12, 2024
2b7ec40
🔊 Factored out excluded_duplicated to base class
ronaldokun Mar 12, 2024
c2d6502
♻️ Renamed SMP class to Smp
ronaldokun Mar 12, 2024
a9fee90
🐛 Cast list column to string to prevent unhashable errors
ronaldokun Mar 12, 2024
6537028
🔊 Logged invalid_channels excluded
ronaldokun Mar 12, 2024
f42710a
🔊 Added Uplink and grouping to log
ronaldokun Mar 12, 2024
26707f3
✨ Finished comprehensive logging of all datasources
ronaldokun Mar 12, 2024
3f220e6
🔊 Addded logging in `drop_rows_without_location_info`
ronaldokun Mar 13, 2024
2bff183
Add to log invalid city codes. Null or invalid integers catched even …
ronaldokun Mar 13, 2024
d11c1ec
Merge pull request #26 from InovaFiscaliza/ronaldokun/issue25
ronaldokun Mar 13, 2024
b2c6e6b
🐛 Create log column when there isn't one
ronaldokun Mar 13, 2024
3640e12
🎨 Added some useful checking in `register_log`
ronaldokun Mar 13, 2024
f9a0177
⚡️ Added flag `reprocess_sources` in order to just read already proce…
ronaldokun Mar 13, 2024
cee6b03
🔊 Added logging from `normalize_location_names`
ronaldokun Mar 14, 2024
10d2057
🔊 Logged method `fill_missing_coords`
ronaldokun Mar 14, 2024
367acb9
🐛 replaced `json.loads` to eval to prevent single quote errors in dic…
ronaldokun Mar 14, 2024
cf02fce
🧑‍💻 Added rich printing of log
ronaldokun Mar 14, 2024
6f4ed2f
🔊 Improved loggind messages and printing
ronaldokun Mar 14, 2024
b151b5a
🔊 Added logging in `intersec_coordinates_on_poligon`
ronaldokun Mar 14, 2024
cf0700e
🎨 Logging from missing and wrong city info
ronaldokun Mar 14, 2024
a9640d5
✨ Finished functional logging from location.py modules.
ronaldokun Mar 14, 2024
e1235a6
🎨 Filtered the dataframe to get only the used columns
ronaldokun Mar 18, 2024
add42cc
✨ Factored out logic to specific function in order to log column
ronaldokun Mar 18, 2024
c0db38f
♻️ Updated location logic inside Smp Class
ronaldokun Mar 18, 2024
a16aeb5
🔊 Added logging in Estacoes class
ronaldokun Mar 18, 2024
2832a15
♻️ Refactored initialization logic into smaller methods
ronaldokun Mar 18, 2024
00ebcdf
🎨 Improved logic to substitute bad city code from the normalized city…
ronaldokun Mar 18, 2024
9a0a7d4
🐛 Fixed bug when comparing float colums. Added a flat in replace_columns
ronaldokun Mar 18, 2024
5478869
🐛 fill_missing_info had the wrong filter in Codigo_Município: notna i…
ronaldokun Mar 18, 2024
20ff60c
🐛 Spatial join in geopandas was with the coordinates reversed
ronaldokun Mar 18, 2024
c4fd888
🚀 Cleaned notebooks
ronaldokun Mar 18, 2024
c0cfea7
⚰️ Removed deprecated code
ronaldokun Mar 21, 2024
1a70ae4
🔊 Simplificado e eliminado logs com foco no usuário.
ronaldokun Mar 21, 2024
4ddb40e
🔇 Simplified logging message
ronaldokun Mar 25, 2024
eb4f561
🔇 Removed logging about "Classe" in Radcom
ronaldokun Mar 25, 2024
9a872b1
🥅 Catched cases which classe is an empty string, returning pd.NA instead
ronaldokun Mar 25, 2024
167837c
🔇 Removed logging and saving of discarded dfs for now
ronaldokun Mar 25, 2024
dc8b355
🔥 Eliminating logic for processing and logging discarded dfs
ronaldokun Mar 25, 2024
245e05a
🔇 Removed logging of the channles created
ronaldokun Mar 25, 2024
a34711e
🐛 The logging of invalid channels was being made in the rows of the v…
ronaldokun Mar 25, 2024
264c596
🔇 Not logging or savind the discarded stuff for now
ronaldokun Mar 25, 2024
547cf9a
🔊 Simplified logging message
ronaldokun Mar 25, 2024
f87d7c6
🐛 Removed bug introduced with regex replace.
ronaldokun Mar 25, 2024
e6646c2
👔 If the poligon doesn't match the original city, the coordinates are…
ronaldokun Mar 25, 2024
3ec96ed
🔇 Greatly simplified the logging logic
ronaldokun Mar 25, 2024
924c798
🔊 Added processing and saving of log
ronaldokun Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,5 +1,2 @@
*.xlsx filter=lfs diff=lfs merge=lfs -text
*.parquet.gzip filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.ipynb merge=nbdev-merge

61 changes: 30 additions & 31 deletions .github/workflows/codeql-analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ name: "CodeQL"

on:
push:
branches: [ master ]
branches: [master, dev]
pull_request:
# The branches below must be a subset of the branches above
branches: [ master ]
branches: [master, dev]
schedule:
- cron: '22 16 * * 1'
- cron: "22 16 * * 1"

jobs:
analyze:
Expand All @@ -32,41 +32,40 @@ jobs:
strategy:
fail-fast: false
matrix:
language: [ 'python' ]
language: ["python"]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Checkout repository
uses: actions/checkout@v3

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.

# Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.


# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
# Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality

# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2

# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh
# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
4 changes: 2 additions & 2 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
name: Deploy to GitHub Pages
on:
push:
branches: [master]
branches: [master, dev]
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: fastai/workflows/quarto-ghp@master
with: { pre: 1 }

2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu, macos]
os: [ubuntu]
version: ["3.10", "3.11"]
runs-on: ${{ matrix.os }}-latest
steps:
Expand Down
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -147,12 +147,22 @@ checklink/cookies.txt
*.parquet.gzip
*.xlsx
*.html
*.shp
*.shx
*.cpg
*.dbf
*.prj

# output folder
extracao/datasources/arquivos/saida

# VSCODE Files
.virtual_documents

*desktop.ini

# Quarto Files
_docs/
_proc/
index_files/
sidebar.yml
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@ include settings.ini
include LICENSE
include CONTRIBUTING.md
include README.md
include extracao/aero/arquivos/*
include extracao/arquivos/*
recursive-exclude * __pycache__
Loading
Loading