Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output Transformer's auto join of multiple statistics not working if one statistic is empty. #85

Open
KonradUdoHannes opened this issue May 20, 2020 · 1 comment
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@KonradUdoHannes
Copy link
Collaborator

  • Datenguide Python version: 0.2.1
  • Python version: All
  • Operating System: All

Description

The datenguide package provides the possibility to query several statistics at once. When the result is converted to a dataframe the obtained results are automatically joined (outer join). This currently causes an error when one of the statistics is empty. In the spirit of the outer join it and for increased usability a solution that adds all NA columns for that statistic in the result would be preferable.

This was never implemented before, but since it causes an internal error it is considered a bug as well. One could explicitly check for this case and purposefully raise a NotImplemented Error in which case, the issue would stop being a bug while still being a useful enhancement.

What I Did

from datenguidepy.query_builder import Query
q_stat1 = Query.region('01')
q_stat1.add_field('BEVSTD')
print(q_stat1.results().iloc[:,:4].head()) #not empty

q_stat2 = Query.region('01')
q_stat2.add_field('TIE003')
print(q_stat2.results().iloc[:,:4].head()) #empty

q_stat3 = Query.region('01')
q_stat3.add_field('WAHL09')
print(q_stat3.results().iloc[:,:4].head()) #not empty

q_comb13 = Query.region('01')
q_comb13.add_field('BEVSTD')
q_comb13.add_field('WAHL09')
print(q_comb13.results().iloc[:,:5].head()) # Example of the intended autojoin functionality

q_comb12 = Query.region('01')
q_comb12.add_field('BEVSTD')
q_comb12.add_field('TIE003')
print(q_comb12.results().iloc[:,:5].head()) # causes error
@KonradUdoHannes KonradUdoHannes added bug Something isn't working enhancement New feature or request labels May 20, 2020
@enryH enryH self-assigned this May 25, 2020
@pr130
Copy link

pr130 commented Jun 8, 2020

I also ran into this with this example:

q = Query.region('09162000')
stats = q.add_field('BEV001')
q.results() # not empty

q2 = Query.region('09162000')
stats2 = q2.add_field("UMS041")
q2.results() # empty 

# combined
q_combined = Query.region("09162000")
stats = q_combined.add_field('BEV001')
stats2 = q_combined.add_field("UMS041")
q_combined.results() # error 


ValueError                                Traceback (most recent call last)
 in 
     11 stats = q_combined.add_field('BEV001')
     12 stats2 = q_combined.add_field("UMS041")
---> 13 q_combined.results()

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/datenguidepy/query_builder.py in results(self, verbose_statistics, verbose_enums, add_units)
    638                 verbose_statistic_names=verbose_statistics,
    639                 verbose_enum_values=verbose_enums,
--> 640                 add_units=add_units
    641             )
    642         else:

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/datenguidepy/output_transformer.py in transform(self, verbose_statistic_names, verbose_enum_values, add_units)
    401         :return: Returns a pandas DataFrame of the queries results.
    402         """
--> 403         output = self._convert_results_to_frame(self.query_response)
    404         if verbose_statistic_names:
    405             output = self._make_verbose_statistic_names(

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/datenguidepy/output_transformer.py in _convert_results_to_frame(executioner_result)
     46                 result_frames.append(
     47                     QueryOutputTransformer._convert_regions_to_frame(
---> 48                         page, single_query_response.meta_data
     49                     )
     50                 )

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/datenguidepy/output_transformer.py in _convert_regions_to_frame(query_page, meta_data)
     72         if "region" in query_page["data"]:
     73             return QueryOutputTransformer._convert_single_results_to_frame(
---> 74                 query_page["data"]["region"], meta_data
     75             )
     76         elif "allRegions" in query_page["data"]:

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/datenguidepy/output_transformer.py in _convert_single_results_to_frame(region_json, meta)
    120 
    121         joined_results, join_cols = QueryOutputTransformer._join_statistic_results(
--> 122             statistic_frames, list(cast(StatMeta, meta["statistics"]).keys())
    123         )
    124         column_order = QueryOutputTransformer._determine_column_order(

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/datenguidepy/output_transformer.py in _join_statistic_results(statistic_results, statistic_names)
    276                     ),
    277                     on=join_columns,
--> 278                     how="outer",
    279                 )
    280             return result, join_columns

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/pandas/core/frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   7295             copy=copy,
   7296             indicator=indicator,
-> 7297             validate=validate,
   7298         )
   7299 

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     86         validate=validate,
     87     )
---> 88     return op.get_result()
     89 
     90 

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/pandas/core/reshape/merge.py in get_result(self)
    641             self.left, self.right = self._indicator_pre_merge(self.left, self.right)
    642 
--> 643         join_index, left_indexer, right_indexer = self._get_join_info()
    644 
    645         ldata, rdata = self.left._data, self.right._data

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/pandas/core/reshape/merge.py in _get_join_info(self)
    860             )
    861         else:
--> 862             (left_indexer, right_indexer) = self._get_join_indexers()
    863 
    864             if self.right_index:

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/pandas/core/reshape/merge.py in _get_join_indexers(self)
    839         """ return the join indexers """
    840         return _get_join_indexers(
--> 841             self.left_join_keys, self.right_join_keys, sort=self.sort, how=self.how
    842         )
    843 

~/dev/correlaid/misc/datenguide-py-test/datenguidepytest/lib/python3.7/site-packages/pandas/core/reshape/merge.py in _get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
   1310     )
   1311     zipped = zip(*mapped)
-> 1312     llab, rlab, shape = [list(x) for x in zipped]
   1313 
   1314     # get flat i8 keys from label lists

ValueError: not enough values to unpack (expected 3, got 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants