Skip to content

bytes fields fail with UnicodeDecodeError for non-UTF8 binary data #779

@abrookins

Description

@abrookins

Description

bytes fields in both HashModel and JsonModel fail with UnicodeDecodeError when storing actual binary data (non-UTF8 bytes). Only ASCII-compatible byte sequences work.

Steps to Reproduce

from aredis_om import JsonModel, Field, Migrator
import asyncio

class File(JsonModel, index=True):
    filename: str
    content: bytes

async def main():
    await Migrator().run()
    
    # This works (ASCII-only bytes):
    f1 = File(filename="text.txt", content=b"Hello World")
    await f1.save()  # ✅ OK
    
    # This fails (binary data):
    f2 = File(filename="image.png", content=b"\x89PNG\r\n\x1a\n")
    await f2.save()  # ❌ UnicodeDecodeError

asyncio.run(main())

Error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

Stack Trace

File "aredis_om/model/model.py", line 3130, in save
    data = jsonable_encoder(data)
File "aredis_om/model/encoders.py", line 134, in jsonable_encoder
    encoded_value = jsonable_encoder(
File "aredis_om/model/encoders.py", line 171, in jsonable_encoder
    return ENCODERS_BY_TYPE[type(obj)](obj)
File "pydantic/deprecated/json.py", line 55, in <lambda>
    bytes: lambda o: o.decode(),

Root Cause

The jsonable_encoder in aredis_om/model/encoders.py uses Pydantic's encoder mapping which calls bytes.decode() without any encoding parameter, defaulting to UTF-8. This fails for any bytes that aren't valid UTF-8.

Expected Behavior

bytes fields should be able to store arbitrary binary data, not just UTF-8 compatible bytes. Options:

  1. Base64 encode bytes automatically - Store as base64 string, decode on retrieval
  2. Use latin-1 encoding - Can represent any byte value (0-255)
  3. Document the limitation - If intentional, document that bytes must be UTF-8 compatible

Current Workaround

Store binary data as base64-encoded strings:

import base64

class File(JsonModel, index=True):
    filename: str
    content_b64: str  # Store as base64 string
    
    @classmethod
    def create(cls, filename: str, content: bytes):
        return cls(filename=filename, content_b64=base64.b64encode(content).decode())
    
    @property
    def content(self) -> bytes:
        return base64.b64decode(self.content_b64)

Environment

  • redis-om version: 1.0.0 (current main branch)
  • Python version: 3.12
  • Pydantic version: 2.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions