-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage optimization via reuse of SchemaValidator
and SchemaSerializer
#1616
Conversation
CodSpeed Performance ReportMerging #1616 will not alter performanceComparing Summary
|
Impressive! I hope you will not stop here. |
SchemaValidator
and SchemaSerializer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine enough! I wonder, is there a way this can be tested? Maybe do something evil like modify the __pydantic_validator__
on a type and confirm that validator is picked up? 🙈
Wondering if I should consolidate the shared "extract prebuilt" logic between the validator and serializers... |
I think that would be smart, suggest file to go at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, feel free to consolidate logic and then merge 👍
The main goal of this PR is to reduce memory usage associated with models by minimizing how much space
SchemaValidator
s andSchemaSerializer
s consume.This is done by reusing references to existing validators and serializers when there are nested structures present.
This is allowed for
model
and (pydantic)dataclass
core schemas. Notably, we don't use this reuse pattern for generic dataclasses (see code commentary for more info).This has been benchmarked against many examples. The examples where this refactor has the most impact are those which have lots large/nested models.
Two examples where improvements were particularly noticeable were for schema builds with:
Some highlight stats:
aiotdlib
, and a small 3-5% improvement for the kubernetes example.For
aiotdlib
on
main
:with this branch:
Old flamegraph:
data:image/s3,"s3://crabby-images/87610/87610b52cb651474ffd03a9aec41a47e668ecf85" alt="Screenshot 2025-02-04 at 12 17 16 PM"
New flamegraph:
data:image/s3,"s3://crabby-images/d31a3/d31a3d7d15f6160c3c8de1f140d7912cd8204941" alt="Screenshot 2025-02-04 at 12 17 36 PM"
For
k8s_v2.py
on
main
:On this branch:
Old flamegraph:
data:image/s3,"s3://crabby-images/e8080/e80807f8d93fa1c696e52e225265e6dbfa45eb3c" alt="Screenshot 2025-02-04 at 12 08 17 PM"
New flamegraph:
data:image/s3,"s3://crabby-images/a67ce/a67ce503a775928a185a47701680013a55198620" alt="Screenshot 2025-02-04 at 12 08 43 PM"
Thanks 🚀
@Viicos with the help finding some examples that were appropriate for benchmarking, and the idea to skip the core schema modifications for simplicity 👍
@davidhewitt for iterating with me on the appropriate pyo3 tools to use for this :)
@BoxyUwU for your work on #1414 which got us started down this path