Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage optimization via reuse of SchemaValidator and SchemaSerializer #1616

Merged
merged 13 commits into from
Feb 5, 2025

Conversation

sydney-runkle
Copy link
Member

@sydney-runkle sydney-runkle commented Jan 31, 2025

The main goal of this PR is to reduce memory usage associated with models by minimizing how much space SchemaValidators and SchemaSerializers consume.

This is done by reusing references to existing validators and serializers when there are nested structures present.

This is allowed for model and (pydantic) dataclass core schemas. Notably, we don't use this reuse pattern for generic dataclasses (see code commentary for more info).

This has been benchmarked against many examples. The examples where this refactor has the most impact are those which have lots large/nested models.

Two examples where improvements were particularly noticeable were for schema builds with:

Some highlight stats:

  • Extreme reduction in total number of allocations, we tested up to almost 7x, but this could be even greater depending on model structure
  • Significant reduction in resident memory size - this is probably the most important metric for users - our experiments showed results between 2-4x
  • Reduction in total memory allocated (1.5-2x)
  • Schema build times also have the potential to improve, we saw ~15% build time reduction for aiotdlib, and a small 3-5% improvement for the kubernetes example.

For aiotdlib

Metric Before After Change % Change Reduction Factor
Resident Memory Size 884MB 212MB -672MB -76.0% 4.17×
Total Allocations 5,069,626 746,466 -4,323,160 -85.3% 6.79×
Total Memory Allocated 1.317GB 671MB -646MB -49.1% 1.96×
`aiotdlib.py` (consolidated models)
  • Total memory allocation is ~50% of what it was previously
  • Total number of allocations has dropped 6.8x
  • Resident memory has reduced by 4x -- this is probably the most valuable stat here!

on main:

📏 Total allocations:
        5069626

📦 Total memory allocated:
        1.317GB

📊 Histogram of allocation size:
        min: 1.000B
        ----------------------------------------------
        < 4.000B   :  221354 ▇▇▇▇
        < 18.000B  : 1783931 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 76.000B  : 1600031 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 326.000B :  806918 ▇▇▇▇▇▇▇▇▇▇▇▇
        < 1.354KB  :  531049 ▇▇▇▇▇▇▇▇
        < 5.754KB  :   90030 ▇▇
        < 24.456KB :   34687 ▇
        < 103.938KB:    1495 ▇
        < 441.735KB:      54 ▇
        <=1.833MB  :      77 ▇
        ----------------------------------------------
        max: 1.833MB

📂 Allocator type distribution:
         MALLOC: 4102467
         REALLOC: 913128
         CALLOC: 45971
         MMAP: 8060

🥇 Top 5 largest allocating locations (by size):
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 510.371MB
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 278.361MB
        - __init__:/Users/sydney-runkle/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/typing.py:1035 -> 239.962MB
        - _get_code_from_file:<frozen runpy>:259 -> 66.375MB
        - data:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_namespace_utils.py:91 -> 21.933MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 3347638
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 1559311
        - __init__:/Users/sydney-runkle/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/typing.py:1035 -> 56044
        - _extract_json_schema_info_from_field_info:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_generate_schema.py:258 -> 21452
        - _get_code_from_file:<frozen runpy>:259 -> 17837

with this branch:

📏 Total allocations:
        746466

📦 Total memory allocated:
        670.775MB

📊 Histogram of allocation size:
        min: 1.000B
        ---------------------------------------------
        < 4.000B   :  53493 ▇▇▇▇▇▇
        < 18.000B  : 181024 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 76.000B  : 264162 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 326.000B :  79409 ▇▇▇▇▇▇▇▇
        < 1.354KB  : 105607 ▇▇▇▇▇▇▇▇▇▇
        < 5.754KB  :  33087 ▇▇▇▇
        < 24.456KB :  28484 ▇▇▇
        < 103.938KB:   1111 ▇
        < 441.735KB:     36 ▇
        <=1.833MB  :     53 ▇
        ---------------------------------------------
        max: 1.833MB

📂 Allocator type distribution:
         MALLOC: 601531
         REALLOC: 91757
         CALLOC: 45038
         MMAP: 8140

🥇 Top 5 largest allocating locations (by size):
        - __init__:/Users/sydney-runkle/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/typing.py:1035 -> 240.962MB
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 80.636MB
        - _get_code_from_file:<frozen runpy>:259 -> 66.375MB
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 44.481MB
        - data:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_namespace_utils.py:91 -> 21.933MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 381038
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 204972
        - __init__:/Users/sydney-runkle/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/typing.py:1035 -> 56045
        - _extract_json_schema_info_from_field_info:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_generate_schema.py:258 -> 21452
        - _get_code_from_file:<frozen runpy>:259 -> 17837

Old flamegraph:
Screenshot 2025-02-04 at 12 17 16 PM

New flamegraph:
Screenshot 2025-02-04 at 12 17 36 PM

For k8s_v2.py

Metric Before After Change % Change Reduction Factor
Resident Memory Size 563MB 290MB -273MB -48.5% 1.94×
Total Allocations 1,969,609 586,011 -1,383,598 -70.2% 3.36×
Total Memory Allocated 787MB 519MB -268MB -34.1% 1.52×

on main:

📏 Total allocations:
        1969609

📦 Total memory allocated:
        786.890MB

📊 Histogram of allocation size:
        min: 1.000B
        ---------------------------------------------
        < 4.000B   : 340813 ▇▇▇▇▇▇▇▇▇▇▇▇
        < 21.000B  : 747070 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 100.000B : 339933 ▇▇▇▇▇▇▇▇▇▇▇▇
        < 466.000B : 120403 ▇▇▇▇▇
        < 2.114KB  : 363496 ▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 9.824KB  :  56606 ▇▇
        < 45.646KB :    991 ▇
        < 212.084KB:    124 ▇
        < 985.395KB:     50 ▇
        <=4.471MB  :    123 ▇
        ---------------------------------------------
        max: 4.471MB

📂 Allocator type distribution:
         MALLOC: 1826508
         REALLOC: 100096
         CALLOC: 42912
         MMAP: 93

🥇 Top 5 largest allocating locations (by size):
        - _get_code_from_file:<frozen runpy>:259 -> 282.632MB
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 187.160MB
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 133.740MB
        - __init__:/Users/sydney-runkle/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/typing.py:1035 -> 21.739MB
        - from_field:/Users/sydney-runkle/Work/oss/pydantic/pydantic/fields.py:279 -> 13.876MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 1241381
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 507059
        - _get_code_from_file:<frozen runpy>:259 -> 74830
        - _apply_annotations:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_generate_schema.py:2098 -> 30023
        - from_field:/Users/sydney-runkle/Work/oss/pydantic/pydantic/fields.py:279 -> 18946

On this branch:

📏 Total allocations:
        586011

📦 Total memory allocated:
        518.610MB

📊 Histogram of allocation size:
        min: 1.000B
        ---------------------------------------------
        < 4.000B   :  90275 ▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 21.000B  : 155729 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 100.000B : 101222 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 466.000B :  21643 ▇▇▇▇
        < 2.114KB  : 178561 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 9.824KB  :  37417 ▇▇▇▇▇▇
        < 45.646KB :    911 ▇
        < 212.084KB:     96 ▇
        < 985.395KB:     42 ▇
        <=4.471MB  :    115 ▇
        ---------------------------------------------
        max: 4.471MB

📂 Allocator type distribution:
         MALLOC: 510537
         CALLOC: 41977
         REALLOC: 33412
         MMAP: 85

🥇 Top 5 largest allocating locations (by size):
        - _get_code_from_file:<frozen runpy>:259 -> 282.632MB
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 36.204MB
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 23.107MB
        - __init__:/Users/sydney-runkle/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/typing.py:1035 -> 21.739MB
        - from_field:/Users/sydney-runkle/Work/oss/pydantic/pydantic/fields.py:279 -> 14.876MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - create_schema_validator:/Users/sydney-runkle/Work/oss/pydantic/pydantic/plugin/_schema_validator.py:51 -> 240515
        - complete_model_class:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_model_construction.py:611 -> 96597
        - _get_code_from_file:<frozen runpy>:259 -> 74830
        - _apply_annotations:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_generate_schema.py:2098 -> 30024
        - _extract_json_schema_info_from_field_info:/Users/sydney-runkle/Work/oss/pydantic/pydantic/_internal/_generate_schema.py:258 -> 30022

Old flamegraph:
Screenshot 2025-02-04 at 12 08 17 PM

New flamegraph:
Screenshot 2025-02-04 at 12 08 43 PM

Thanks 🚀

@Viicos with the help finding some examples that were appropriate for benchmarking, and the idea to skip the core schema modifications for simplicity 👍
@davidhewitt for iterating with me on the appropriate pyo3 tools to use for this :)
@BoxyUwU for your work on #1414 which got us started down this path

Copy link

codspeed-hq bot commented Jan 31, 2025

CodSpeed Performance Report

Merging #1616 will not alter performance

Comparing prebuilt-variant (99136a5) with main (fdccecd)

Summary

✅ 157 untouched benchmarks

@fruitoiz
Copy link

fruitoiz commented Feb 1, 2025

Impressive! I hope you will not stop here.

@sydney-runkle sydney-runkle marked this pull request as ready for review February 4, 2025 16:07
@sydney-runkle sydney-runkle changed the title Memory usage optimization - use prebuilt validators and serializers Memory usage optimization via reuse of SchemaValidator and SchemaSerializer Feb 4, 2025
Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine enough! I wonder, is there a way this can be tested? Maybe do something evil like modify the __pydantic_validator__ on a type and confirm that validator is picked up? 🙈

@sydney-runkle
Copy link
Member Author

Wondering if I should consolidate the shared "extract prebuilt" logic between the validator and serializers...

@davidhewitt
Copy link
Contributor

I think that would be smart, suggest file to go at src/common/prebuilt.rs for the shared logic.

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to consolidate logic and then merge 👍

@sydney-runkle sydney-runkle merged commit 164b9ff into main Feb 5, 2025
28 checks passed
@sydney-runkle sydney-runkle deleted the prebuilt-variant branch February 5, 2025 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants