-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Code of Conduct
- I agree to follow Django's Code of Conduct
Feature Description
This proposal suggests extending Django’s QuerySet.iterator() and QuerySet.aiterator() methods with an optional parameter (e.g. batch_yield_size) that allows them to yield lists of model instances in batches, rather than yielding one object at a time.
The goal is to combine the memory efficiency of streaming querysets with the convenience of grouped (batched) processing.
Problem
Currently, QuerySet.iterator() and QuerySet.aiterator() always yield one model instance per iteration. While this is ideal for memory efficiency, it becomes inconvenient or inefficient for common batch-oriented workflows, such as:
- Bulk writes to external systems (APIs, message queues, file exports)
- Batch validation or transformation pipelines
- Chunked background processing
- Network or I/O bound operations where per-object overhead is significant
Developers often work around this by manually buffering results into lists, for example:
buffer = []
for obj in queryset.iterator():
buffer.append(obj)
if len(buffer) == 1000:
process(buffer)
buffer.clear()This pattern is repetitive, error-prone, and obscures intent. Django already supports database-level chunking via chunk_size, but there is no built-in way to express logical batch processing at the iteration level.
Request or proposal
proposal
Additional Details
This feature is orthogonal to chunk_size, which controls how many rows are fetched from the database cursor.
It differs from Paginator in that:
It does not require counting the full queryset.
It supports true streaming over large datasets.
The feature is especially useful for async workflows, where per-object await overhead is high and batching significantly improves throughput.
The default behavior would remain unchanged; batching would be opt-in.
Implementation Suggestions
Synchronous Example
for batch in MyModel.objects.iterator(
chunk_size=2000,
batch_yield_size=500,
):
send_to_api(batch)
Proposed behavior:
Internally fetch rows using chunk_size
Yield lists of up to batch_yield_size model instances
The final batch may be smaller
Asynchronous Example
async for batch in MyModel.objects.aiterator(
chunk_size=2000,
batch_yield_size=500,
):
await send_to_api(batch)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status