-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/skip date with output #314
base: master
Are you sure you want to change the base?
Changes from all commits
f8a6081
b8f564c
3c72faa
3d727f1
b2b00fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,7 +7,7 @@ | |
"filtered_feeds": "filtered_feeds", | ||
"logs": "logs" | ||
}, | ||
"output_file_name_regexp": "^(?P<date_str>[^_]+?)_(?P<type>\\w+)", | ||
"output_file_name_regexp": "^(?P<type>\\w+)_(?P<date_str>[^_]+?)", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure about this change? Did you consult with @cjer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you can see here, there is no option to change the format of the filenames:
|
||
"output_file_type": "csv.gz" | ||
}, | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
import datetime | ||
import logging | ||
import re | ||
from os import listdir | ||
from os.path import split, join, exists | ||
|
@@ -18,20 +19,44 @@ def _get_existing_output_files(output_folder: str) -> List[Tuple[datetime.date, | |
configuration = load_configuration() | ||
file_name_re = configuration.files.output_file_name_regexp | ||
file_type_re = configuration.files.output_file_type.replace('.', '\\.') | ||
regexp = file_name_re + '\\.' + file_type_re | ||
regexp = re.compile(file_name_re + '\\.' + file_type_re) | ||
|
||
existing_output_files = [] | ||
|
||
for file in listdir(output_folder): | ||
match = re.match(regexp, file) | ||
if match: | ||
date_str, stats_type = match.groups() | ||
file_type = (parse_conf_date_format(date_str), stats_type) | ||
file_type = _parse_file_name_regex_match(match) | ||
if file_type is None: | ||
# return empty list if there was an error in one of the files | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd guess we'd like to return only the found files. Why would we give up on all of the files if one failed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought that it is indicating of something weird going on, but maybe you are right. |
||
return [] | ||
existing_output_files.append(file_type) | ||
|
||
return existing_output_files | ||
|
||
|
||
def _parse_file_name_regex_match(match: re.Match): | ||
results = match.groupdict() | ||
# validate that the regex used the correct group names | ||
if ("type" not in results) or ("date_str" not in results): | ||
# assume the order of the fields | ||
stats_type, date_str = match.groups() | ||
logging.info("The output file regex didn't use the correct group names: (type, date_str), " | ||
"for more information look in the configuration docs. trying unnamed groups") | ||
else: | ||
# regex has the correct groups | ||
stats_type, date_str = results.get("type"), results.get("date_str") | ||
try: | ||
# try to parse the extracted date | ||
parsed_date = parse_conf_date_format(date_str) | ||
except ValueError: | ||
logging.info(f'failed to parse date from file name, skipping the search. ' | ||
f'the date was: {date_str!r}') | ||
# skip on first failure | ||
return None | ||
return parsed_date, stats_type | ||
|
||
|
||
def get_dates_without_output(dates: List[datetime.date], output_folder: str) -> List[datetime.date]: | ||
""" | ||
List dates without output files in the given folder (currently just route_stats is considered). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove
output_file_name_regexp
? If you changed it to be optional, specify it in the schema