Skip to content

Latest commit

 

History

History
120 lines (64 loc) · 7.58 KB

MIGRATION.md

File metadata and controls

120 lines (64 loc) · 7.58 KB

Migrating to Pydantic

Motivations

DRY (Don't Repeat Yourself)

A key goal of Python is to avoid having to state the same thing more than once.

The original ("classic") camdkit parameters, prior to the addition of tracking info, were with one exception POD ("Plain Old Data") types. There was a paradigm of re-use of POD descriptors for multiple parameters that might use those descriptors, e.g. there could be a descriptor representing UUIDs, the validate(), to_json(), from_json() and make_json_schema() methods were on that descriptor, and all parameters whose values were UUIDs could be based on this descriptor. Loosely speaking, all of the descriptor logic was in framework.py, and all of the parameter logic was in model.py, and it was very readable.

When the tracking info was introduced, and the parameters were very far from being specific instances of POD types, two things happened: first, the 'backing' descriptor in framework.py became so specific to its use for a particular parameter that there could be no re-use, and second, in many cases the logic implementing the parameter semantics was showing up in model.py (i.e the aforementioned validate(), to_json(), from_json() and make_json_schema() methods).

A goal of the Pydantic re-hosting of camdkit was to reduce the number of places where one was "saying the same thing". As the Pydantic re-hosting PR will show, representing our information as a set of nested models allows one to put all the details of a metadatum's representation, range restrictions, etc. in a single place, without repeating oneself.

Don't hand-code what could be automatically generated

Pydantic takes care of validation (see "Cannot construct invalid objects..." below), serialization and deserialization, and schema generation itself. Thus many small errors (use of minLength for an array instead of minItems) are prevented because hand-coding is eliminated.

Two schemas, not one

Serialization schema

This is what most of you think of as our schema: something that expresses the set of valid metadata that can be passed "through the wire".

Validation schema

One thing not expressed in the hand-generated JSON schema in classic camdkit is the way in which None can be a valid value of a Parameter, and the many places where that value of None is the default. One might create a Clip, and find that the initial value of (say) lens_serial_number is None, then set the value of lens_serial_number to "foo", read it back to verify that the value is indeed now "foo", then decide to clear it so one explicitly assigns a value of None to the parameter, and one can then read it back to be sure that it is in fact None once more.

None of this would be evident from looking at the published schema generated by classic camdkit, because what we publish is a serialization schema.

The Pydantic re-hosting of camdkit produces a serialization schema when Clip.make_json_schema() is called, but could easily produce a validation schema if that were desirable.

Specific changes

Cannot construct invalid objects, even in temporary expressions

Code such as

import unittest

from camdkit.framework import Timestamp
from camdkit.model import TimingTimestamp

class TempConstructionCases(unittest.TestCase):
  def test_timestamp_is_invalid(self):
    self.assertFalse(TimingTimestamp.validate(Timestamp(-1, 2)))

will fail, because in order to run validate() against something, that something needs to first be constructed, and Pydantic won't let you get even that far -- it will raise a ValidationError when the attempt to construct a temporary object Timestamp(-1, 2) fails.

Default values are not serialized.

Examples:

  • TimecodeFormat objects contain a frame rate as well as the usual HH:MM:SS:FF, and this frame rate includes a sub_frame component, an integer representing a 0-based index into component pieces of the frame. Metadata parameters associated with the first field of an interlaced frame would have a sub_frame component of 0; the second field, 1. (n.b. the serialized canonical form of sub_frame is sub_frame).
    • In classic camdkit, the __init__() method of TimecodeFormat defined in model.py defaulted sub_frame to 0, and the to_json() method of TimingTimecode in model.py always wrote it out.
    • In modified classic camdkit, if it turns out the value of sub_frame is the default, it is not serialized; it is assumed that the deserialization at the other end will reconstitute it from the default.
    • in Pydantic camdkit it is not serialized (because all serialization takes place in CompatibleBaseModel.to_json(), and that method invokes Pydantic's model_dump with exclude_defaults=True).

Various deviations from the JSON schema specification are correted

Examples:

  • The 0-9 range of protocol version number components is now indicated with minimum and maximum as the prior minValue and maxValue were not valid for integers.

  • the minimum of one element in arrays such as those used to carry distortion coefficients is now indicated with minItems as the classic camdkit use of minLength and maxLength was inappropriate (minLength and maxLength are only valid for strings).

New range-restricted real-number parameter types

Examples:

  • StrictlyPositiveRealParameter was added to support nominal focal length ans focus distance, two parameters where beyond negative values being disallowed, zero values are disallowed as well.

Nominal Focal Length and FocusDistance can no longer be zero

These two parameters are now based on StrictlyPositiveRealParameter

RealParameter and derivative parameters serialize as floats

In classic camdkit a nominal focal length of 13 mm would be serialized as 13. In Pydantic, the model field for a nominal focal length serializes as 13.0. In modified classic camdkit the value is cast to a float before being serialized, thus producing a Pydantic-compatible 13.0.

Both name and version components of Protocol are required

In classic camdkit this requirement is not expressed by the make_json_schema() method of Protocol; in the modified classic camdkit, this requirement is made explicit (and matches what Pydantic would produce for the corresponding BaseModel-based object).

Distortion model name length must be non-blank and < 1024 characters long

This conforms to what we require of other string parameters as well.

PTP leader regex allows lower and upper case hex, and '-' separators

I am having trouble finding my references here, but I believe I've seen normative use of both upper-case and lower-case hex letters (i.e. both A-F and a-f) and both colon and hyphen separators.

Numerators of rationals are restricted to signed int ranges

Example: the numerator for the frame rate of timecode was UINT_MAX, but should have been INT_MAX.

Both upper and lower bounds of integer ranges are given

Example: lens raw encoder values previously specified a minimum of 0 and no maximum; now they specify a minimum of 0 and a maximum of UINT_MAX (i.e. the largest unsigned 32-bit integer).

Magic numbers are restricted to the start of source files

Example: "maximum": 2147483647 becomes "maximum": INT_MAX

Minor typos fixed

Example: ...string betwee 0 and 1023 codepoints. becomes ...string between 0 and 1023 codepoints

Many docstrings fit on one line

PEP 257 (referenced by the all-powerful PEP 8) allows for single-line docstrings; and PEP 8 says that lines can be up to 79 characters. This increases the readability of the code.