Repository | Python Documentation | Python Installation | PyPI | Rust Crate | Rust Documentation
Fast conversion between Protocol Buffers and Apache Arrow, using Rust, with Python bindings.
ptars converts directly between the protobuf wire format and Arrow columnar arrays.
No intermediate message objects are created.
Serialized protobuf bytes are parsed straight into Arrow builders.
And Arrow arrays are encoded directly to protobuf wire format,
skipping the overhead of DynamicMessage or any per-row object allocation.
Take a protobuf:
message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
}And convert serialized messages directly to pyarrow.RecordBatch:
from ptars import HandlerPool
messages = [
SearchRequest(
query="protobuf to arrow",
page_number=0,
result_per_page=10,
),
SearchRequest(
query="protobuf to arrow",
page_number=1,
result_per_page=10,
),
]
payloads = [message.SerializeToString() for message in messages]
pool = HandlerPool([SearchRequest.DESCRIPTOR.file])
handler = pool.get_for_message(SearchRequest.DESCRIPTOR)
record_batch = handler.list_to_record_batch(payloads)| query | page_number | result_per_page |
|---|---|---|
| protobuf to arrow | 0 | 10 |
| protobuf to arrow | 1 | 10 |
You can also convert a pyarrow.RecordBatch back to serialized protobuf messages:
array: pa.BinaryArray = handler.record_batch_to_array(record_batch)
messages_back: list[SearchRequest] = [
SearchRequest.FromString(s.as_py()) for s in array
]Customize Arrow type mappings with PtarsConfig:
from ptars import HandlerPool, PtarsConfig
config = PtarsConfig(
timestamp_unit="us", # microseconds instead of nanoseconds
timestamp_tz="America/New_York",
)
pool = HandlerPool([SearchRequest.DESCRIPTOR.file], config=config)Ptars is a Rust implementation of protarrow, which is implemented in plain Python. By encoding and decoding directly between protobuf wire format and Arrow arrays, ptars is:
- 7x+ faster when converting from proto to Arrow.
- 30x+ faster when converting from Arrow to proto.
---- benchmark 'to_arrow': 2 tests ----
Name (time in us) Mean
---------------------------------------
ptars_to_arrow 659 (1.0)
protarrow_to_arrow 5,037 (7.65)
---------------------------------------
---- benchmark 'to_proto': 2 tests -----
Name (time in us) Mean
----------------------------------------
ptars_to_proto 397 (1.0)
protarrow_to_proto 12,534 (31.61)
----------------------------------------