-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: stream csv downloads #428
Conversation
9f3a070
to
8f36eec
Compare
8f36eec
to
f30b1ad
Compare
enterprise_data/api/v1/views.py
Outdated
return self.get_paginated_response(serializer.data) | ||
|
||
def data_gen(queryset): | ||
paginator = Paginator(queryset, per_page=10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small suggestion: make the 10000
value come from settings, so we can tune it up or down without deployment.
enterprise_data/api/v1/views.py
Outdated
page = self.paginate_queryset(queryset) | ||
if page is not None: | ||
serializer = self.get_serializer(page, many=True) | ||
return self.get_paginated_response(serializer.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this here? Should the block of code that conditionally renders the CSV come first?
def list(self, request, *args, **kwargs): | ||
""" | ||
Override the list method to handle streaming CSV download. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea for rollout: introduce a feature flag, where if the flag is off, this method can probably just return super().list(...)
. And if it's on, it can do all the new stuff you've introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, here's an example from SO that might give you an idea for how to structure this code a little differently: https://stackoverflow.com/a/65564367 It might help you simplify this a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you say if enable/disable old/new functionality based on a query param passed from admin-portal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure, that's a good idea too.
|
||
|
||
class EnrollmentsCSVRenderer(CSVStreamingRenderer): | ||
header = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably do something like
header = [field.name for field in EnterpriseLearnerEnrollment._meta.get_fields()]
enterprise_data/api/v1/views.py
Outdated
serializer = self.get_serializer(enrollments, many=True) | ||
yield serializer.data | ||
|
||
if self.request.query_params.get('data') == 'csv': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary? Does requesting ...enrollments.csv
no longer work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to do something like this if you want to switch behavior based on whether .csv
is being requested: https://www.django-rest-framework.org/api-guide/renderers/#varying-behavior-by-media-type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary? Does requesting
...enrollments.csv
no longer work?
Unfortunately, yes. I really want to avoid this. I will give it another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests like enrollments.csv
will now work
@iloveagent57 Bundle of the thanks for the feedback. This work is in progress. I wanted to get your thoughts on the overall approach. I will address all the feedback. One question, Things are clear regarding streaming csv. How about chunking the DB reads? |
1faf678
to
410536c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good! Have you been able to test this out on a fake/large data set?
enterprise_data/api/v1/views.py
Outdated
if self.request.query_params.get('streaming_csv_enabled') == 'true': | ||
if request.accepted_renderer.format == 'csv': | ||
return StreamingHttpResponse( | ||
EnrollmentsCSVRenderer().render(self._stream_serialized_data()), | ||
content_type='text/csv' | ||
) | ||
|
||
return super().list(request, *args, **kwargs) | ||
|
||
def _stream_serialized_data(self): | ||
""" | ||
Stream the serialized data. | ||
""" | ||
queryset = self.filter_queryset(self.get_queryset()) | ||
serializer = self.get_serializer_class() | ||
paginator = Paginator(queryset, per_page=settings.ENROLLMENTS_PAGE_SIZE) | ||
for page_number in paginator.page_range: | ||
yield from serializer(paginator.page(page_number).object_list, many=True).data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, nice change to clean this up.
Yes, I am using the dataset for https://portal.edx.org/national-university-singapore/admin/learners for local testing. Results with reading chunks of 10000 records from DB With streaming
Without streaming
|
b3ac8f1
to
db03208
Compare
db03208
to
8f8dd89
Compare
JIRA: https://2u-internal.atlassian.net/browse/ENT-8301
Dependency: openedx/frontend-app-admin-portal#1167
Merge checklist:
requirements/*.txt
files)base.in
if needed in production but edx-analytics-data-api doesn't install ittest-master.in
if edx-analytics-data-api pins it, with a matching versionmake upgrade && make requirements
have been run to regenerate requirementsmake static
has been run to update webpack bundling if any static content was updated./manage.py makemigrations
has been run./manage.py makemigrations
in the shell.Post merge:
(so basically once your build finishes, after maybe a minute you should see the new version in PyPi automatically (on refresh))
make upgrade
in edx-analytics-data-api will look for the latest version in PyPi.