You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following binary example comes straight from the PyArrow documentation. We recently added Polars support to PyMongoArrow, and used this to create an ObjectIdType. It's implementation is almost identical to the example.
In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key.
If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field.
The MongoDB documentation on ObjectIds.
Until Polars supports pyarrow.ExtensionTypes, we must cast them to their base Arrow classes.
To reproduce the issue.
importpyarrowaspaimportpolarsasplimportuuidclassUuidType(pa.ExtensionType):
"""For example, we could define a custom UUID type for 128-bit numbers which can be represented as FixedSizeBinary type with 16 bytes """def__init__(self):
super().__init__(pa.binary(16), "my_package.uuid")
def__arrow_ext_serialize__(self):
# Since we don't have a parameterized type, we don't need extra# metadata to be deserializedreturnb''@classmethoddef__arrow_ext_deserialize__(cls, storage_type, serialized):
# Sanity checks, not required but illustrate the method signature.assertstorage_type==pa.binary(16)
assertserialized==b''# Return an instance of this subclass given the serialized# metadata.returnUuidType()
uuid_type=UuidType()
storage_array=pa.array([uuid.uuid4().bytesfor_inrange(4)], pa.binary(16))
extension_arr=pa.ExtensionArray.from_storage(uuid_type, storage_array)
print(f"{pl.from_arrow(storage_array) =}")
try:
print(f"{pl.from_arrow(extension_arr) =}")
exceptpl.exceptions.ComputeErrorasexc:
print(f"{exc=}")
The text was updated successfully, but these errors were encountered:
Description
Add support for PyArrow Extension Types.
Context / Motivation
Here are some details on extending pyarrow.
The following binary example comes straight from the PyArrow documentation. We recently added Polars support to PyMongoArrow, and used this to create an ObjectIdType. It's implementation is almost identical to the example.
In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key.
If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field.
The MongoDB documentation on ObjectIds.
Until Polars supports pyarrow.ExtensionTypes, we must cast them to their base Arrow classes.
To reproduce the issue.
The text was updated successfully, but these errors were encountered: