-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "taxon_by" param to *_frequencies_advanced()
functions and allow the "period_by" param to specify a column name
#694
base: master
Are you sure you want to change the base?
Conversation
…usage to notebook.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hi @alimanfoo . Do we know how we want to allow any value in the
def prep_samples_for_cohort_grouping(
*, df_samples, area_by, period_by, taxon_by="taxon"
):
# Take a copy, as we will modify the dataframe.
df_samples = df_samples.copy()
# Fix "intermediate" or "unassigned" taxon values - we only want to build
# cohorts with clean taxon calls, so we set other values to None.
loc_intermediate_taxon = (
df_samples[taxon_by].str.startswith("intermediate").fillna(False)
)
df_samples.loc[loc_intermediate_taxon, taxon_by] = None
loc_unassigned_taxon = (
df_samples[taxon_by].str.startswith("unassigned").fillna(False)
)
df_samples.loc[loc_unassigned_taxon, taxon_by] = None
# Add period column.
if period_by == "year":
make_period = _make_sample_period_year
elif period_by == "quarter":
make_period = _make_sample_period_quarter
elif period_by == "month":
make_period = _make_sample_period_month
else: # pragma: no cover
raise ValueError(
f"Value for period_by parameter must be one of 'year', 'quarter', 'month'; found {period_by!r}."
)
sample_period = df_samples.apply(make_period, axis="columns")
df_samples["period"] = sample_period
# Add area column for consistent output.
df_samples["area"] = df_samples[area_by]
return df_samples with def _make_sample_period_month(row):
year = row.year
month = row.month
if year > 0 and month > 0:
return pd.Period(freq="M", year=year, month=month)
else:
return pd.NaT
def _make_sample_period_quarter(row):
year = row.year
month = row.month
if year > 0 and month > 0:
return pd.Period(freq="Q", year=year, month=month)
else:
return pd.NaT
def _make_sample_period_year(row):
year = row.year
if year > 0:
return pd.Period(freq="Y", year=year)
else:
return pd.NaT |
@cclarkson @ahernank @jonbrenas Any thoughts or knowledge regarding the above question for this issue? |
Your guess is probably as good as mine. Maybe allow to use a column that already exists as "period" or a dictionary of queries? It might be worth asking for details at the next tools meeting. |
Restricting this PR to just adding the "taxon_by" param. I've split the original issue into two sub-issues, in order to try to address the |
To do: add to other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @leehart. Would be good to add this parameter to the other ..._frequencies_advanced() functions.
To do: investigate and resolve potential merge conflict issue around |
…frq_base.py build_cohorts_from_sample_grouping()
*_frequencies_advanced()
functions
…_advanced(). Merge util.py prep_samples_for_cohort_grouping() into frq_base.py.
Now that I have some clarity on the sibling issue #723 I'm tempted to include it under this PR, and then update the PR title accordingly. |
…uping(). Add nb test. Remove old _make_sample_period_...() funcs from util.py.
*_frequencies_advanced()
functions*_frequencies_advanced()
functions and allow the "period_by" param to specify a column name
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #694 +/- ##
==========================================
- Coverage 96.13% 95.97% -0.16%
==========================================
Files 47 47
Lines 4683 4699 +16
==========================================
+ Hits 4502 4510 +8
- Misses 181 189 +8 ☔ View full report in Codecov by Sentry. |
To do: increase test coverage, in light of codecov test failures. |
Re: parent issue #391
Re: issues #722 and #723