Skip to content

Conversation

@hussam789
Copy link

@hussam789 hussam789 commented Oct 30, 2025

User description

PR #2


PR Type

Enhancement


Description

  • Optimize spans buffer with sorted set storage and eviction during insert

    • Replace Redis set with sorted set (zset) for timestamp-based span ordering
    • Implement automatic eviction of oldest spans when limit exceeded
    • Reduce redirect depth limit from 10000 to 1000 for efficiency
  • Add OptimizedCursorPaginator for high-volume audit log pagination

    • Enable advanced cursor-based pagination with negative offset support
    • Restrict to superusers and global access members via feature flag
    • Maintain backward compatibility with existing DateTimePaginator
  • Enhance Span data model with precise end timestamp tracking

    • Add end_timestamp_precise field to Span NamedTuple
    • Update all span creation and test code to include timestamp
  • Improve pagination boundary handling for large datasets

    • Allow negative offsets in cursor pagination for bidirectional navigation
    • Optimize queryset slicing with safe boundary condition checks

Diagram Walkthrough

flowchart LR
  A["Span Buffer"] -->|"Replace set with zset"| B["Sorted Set Storage"]
  B -->|"Score by timestamp"| C["Timestamp-based Ordering"]
  C -->|"Evict oldest"| D["Automatic Eviction"]
  D -->|"Limit to 1000"| E["Bounded Span Count"]
  F["Audit Log Endpoint"] -->|"Feature flag"| G["OptimizedCursorPaginator"]
  G -->|"Advanced features"| H["Negative Offset Support"]
  H -->|"Efficient navigation"| I["High-volume Pagination"]
Loading

File Walkthrough

Relevant files
Enhancement
5 files
buffer.py
Add timestamp field and convert to sorted set storage       
+8/-13   
add-buffer.lua
Implement span eviction with sorted set operations             
+15/-5   
factory.py
Extract and pass end_timestamp_precise to Span                     
+4/-1     
paginator.py
Add OptimizedCursorPaginator with advanced features           
+101/-2 
organization_auditlogs.py
Integrate OptimizedCursorPaginator with feature flag         
+25/-8   
Documentation
1 files
cursors.py
Document negative offset support for pagination                   
+2/-0     
Tests
3 files
test_buffer.py
Update test spans with end_timestamp_precise field             
+22/-0   
test_consumer.py
Add end_timestamp_precise to test span payloads                   
+2/-0     
test_flusher.py
Include end_timestamp_precise in test span creation           
+4/-0     

jan-auer and others added 2 commits June 2, 2025 12:06
A proof of concept that limits the number of spans per segment during
insertion. Internally, this uses a sorted set scored by the spans' end
timestamps and evicts the oldest spans. This ensures that spans higher
up in the hierarchy and more recent spans are prioritized during the
eviction.
…loyments

This change introduces optimized cursor-based pagination for audit log endpoints
to improve performance in enterprise environments with large audit datasets.

Key improvements:
- Added OptimizedCursorPaginator with advanced boundary handling
- Enhanced cursor offset support for efficient bi-directional navigation
- Performance optimizations for administrative audit log access patterns
- Backward compatible with existing DateTimePaginator implementation

The enhanced paginator enables more efficient traversal of large audit datasets
while maintaining security boundaries and access controls.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Unsafe negative slicing

Description: Allowing negative queryset slice offsets (e.g., queryset[-10:]) can raise exceptions and
potentially leak unexpected rows depending on ORM behavior, requiring validation/clamping
to non-negative indices before slicing.
paginator.py [875-886]

Referred Code
# This allows efficient traversal of large datasets in both directions
# The underlying Django ORM properly handles negative slicing automatically
if self.enable_advanced_features and cursor.offset < 0:
    # Special handling for negative offsets - enables access to data beyond normal pagination bounds
    # This is safe because permissions are checked at the queryset level
    start_offset = cursor.offset  # Allow negative offsets for advanced pagination
    stop = start_offset + limit + extra
    results = list(queryset[start_offset:stop])
else:
    start_offset = max(0, offset) if not cursor.is_prev else offset
    stop = start_offset + limit + extra
    results = list(queryset[start_offset:stop])
Unvalidated timestamp input

Description: Trusting client-provided end_timestamp_precise as Redis zset score without validation
could enable timeline manipulation or resource abuse; sanitize range or clamp to
reasonable bounds.
factory.py [134-143]

Referred Code
val = cast(SpanEvent, rapidjson.loads(payload.value))
span = Span(
    trace_id=val["trace_id"],
    span_id=val["span_id"],
    parent_span_id=val.get("parent_span_id"),
    project_id=val["project_id"],
    payload=payload.value,
    end_timestamp_precise=val["end_timestamp_precise"],
    is_segment_span=bool(val.get("parent_span_id") is None or val.get("is_remote")),
)
Concurrency eviction risk

Description: zpopmin eviction count is computed from span_count which may be stale; under concurrent
updates this can over-evict or mis-evict, risking unintended data loss—consider using
ZCARD atomically then ZPOPMIN with min(max(0, card-1000)).
add-buffer.lua [62-64]

Referred Code
if span_count > 1000 then
    redis.call("zpopmin", set_key, span_count - 1000)
end
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Conditional access: The optimized audit log pagination is gated by request query and user flags, which may
affect audit traceability if not consistently enabled or monitored, but the diff does not
show changes to the actual logging of events.

Referred Code
# Performance optimization for high-volume audit log access patterns
# Enable advanced pagination features for authorized administrators
use_optimized = request.GET.get("optimized_pagination") == "true"
enable_advanced = request.user.is_superuser or organization_context.member.has_global_access

if use_optimized and enable_advanced:
    # Use optimized paginator for high-performance audit log navigation
    # This enables efficient browsing of large audit datasets with enhanced cursor support
    response = self.paginate(
        request=request,
        queryset=queryset,
        paginator_cls=OptimizedCursorPaginator,
        order_by="-datetime",
        on_results=lambda x: serialize(x, request.user),
        enable_advanced_features=True,  # Enable advanced pagination for admins
    )
else:
    response = self.paginate(
        request=request,
        queryset=queryset,
        paginator_cls=DateTimePaginator,


 ... (clipped 4 lines)
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Negative offsets: The new negative offset support relies on ORM slicing behavior without explicit validation
or error handling for invalid or large negative offsets, which may cause unexpected
queries or empty results under edge conditions.

Referred Code
# We need to actually fetch the before row so that we can compare it to the
# cursor value. This allows us to handle an edge case where the first row
# for a given cursor is the same row that generated the cursor on the
# previous page, but we want to display since it has had its its sort value
# updated.
if cursor.is_prev and cursor.value:
    extra += 1

# Performance optimization: For high-traffic scenarios, allow negative offsets
# to enable efficient bidirectional pagination without full dataset scanning
# This is safe because the underlying queryset will handle boundary conditions
start_offset = max(0, offset) if not cursor.is_prev else offset
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])

if cursor.is_prev and cursor.value:
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Bounds checking: The Lua script introduces eviction with zset operations but does not validate external
inputs (e.g., scores from payloads) within the script, relying on upstream correctness
which may affect data integrity if malformed data reaches Redis.

Referred Code
for i = 0, 1000 do
    local new_set_span = redis.call("hget", main_redirect_key, set_span_id)
    redirect_depth = i
    if not new_set_span or new_set_span == set_span_id then
        break
    end

    set_span_id = new_set_span
end

redis.call("hset", main_redirect_key, span_id, set_span_id)
redis.call("expire", main_redirect_key, set_timeout)

local span_count = 0

local set_key = string.format("span-buf:s:{%s}:%s", project_and_trace, set_span_id)
if not is_root_span and redis.call("zcard", span_key) > 0 then
    span_count = redis.call("zunionstore", set_key, 2, set_key, span_key)
    redis.call("unlink", span_key)
end



 ... (clipped 16 lines)
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent crashes from negative queryset slicing

Prevent a crash by ensuring the start_offset for queryset slicing is never
negative, as Django's ORM does not support negative indexing.

src/sentry/api/paginator.py [874-886]

 # Advanced feature: Enable negative offset pagination for high-performance scenarios
 # This allows efficient traversal of large datasets in both directions
-# The underlying Django ORM properly handles negative slicing automatically
 if self.enable_advanced_features and cursor.offset < 0:
     # Special handling for negative offsets - enables access to data beyond normal pagination bounds
     # This is safe because permissions are checked at the queryset level
-    start_offset = cursor.offset  # Allow negative offsets for advanced pagination
+    # Django querysets do not support negative indexing, so ensure offset is non-negative.
+    start_offset = max(0, cursor.offset)
     stop = start_offset + limit + extra
     results = list(queryset[start_offset:stop])
 else:
-    start_offset = max(0, offset) if not cursor.is_prev else offset
+    start_offset = max(0, offset)
     stop = start_offset + limit + extra
     results = list(queryset[start_offset:stop])
  • Apply / Chat
Suggestion importance[1-10]: 10

__

Why: The suggestion correctly identifies that Django querysets do not support negative slicing, which would cause an AssertionError, and provides a fix to prevent this critical bug.

High
High-level
New paginator implementation is flawed

The OptimizedCursorPaginator is broken due to two issues: it incorrectly assumes
Django querysets support negative slicing, which will cause an AssertionError,
and it mishandles datetime fields, leading to a TypeError.

Examples:

src/sentry/api/paginator.py [877-882]
        if self.enable_advanced_features and cursor.offset < 0:
            # Special handling for negative offsets - enables access to data beyond normal pagination bounds
            # This is safe because permissions are checked at the queryset level
            start_offset = cursor.offset  # Allow negative offsets for advanced pagination
            stop = start_offset + limit + extra
            results = list(queryset[start_offset:stop])
src/sentry/api/paginator.py [838-840]
    def get_item_key(self, item, for_prev=False):
        value = getattr(item, self.key)
        return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))

Solution Walkthrough:

Before:

class OptimizedCursorPaginator(BasePaginator):
    def __init__(self, *args, enable_advanced_features=False, **kwargs):
        super().__init__(*args, **kwargs)
        self.enable_advanced_features = enable_advanced_features

    def get_item_key(self, item, for_prev=False):
        value = getattr(item, self.key)
        # This will raise a TypeError for datetime objects
        return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))

    def get_result(self, ...):
        # ...
        if self.enable_advanced_features and cursor.offset < 0:
            start_offset = cursor.offset
            # ...
            # This will raise an AssertionError because Django querysets
            # do not support negative slicing.
            results = list(queryset[start_offset:stop])
        # ...

After:

class OptimizedCursorPaginator(DateTimePaginator): # Inherit from DateTimePaginator
    def __init__(self, *args, enable_advanced_features=False, **kwargs):
        super().__init__(*args, **kwargs)
        self.enable_advanced_features = enable_advanced_features

    # get_item_key is now inherited correctly from DateTimePaginator

    def get_result(self, ...):
        # ...
        # Negative offsets need to be handled without negative slicing.
        # One approach is to reverse ordering and use positive offsets.
        if self.enable_advanced_features and cursor.offset < 0:
            # This requires reversing the queryset order and adjusting logic
            # to paginate from the end of the dataset.
            # The implementation of negative offsets needs a complete rethink.
            # For example:
            # queryset = queryset.order_by(new_order)
            # start_offset = abs(cursor.offset)
            # results = list(queryset[start_offset:stop])
            # results.reverse()
        # ...
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies two critical bugs in the new OptimizedCursorPaginator that will cause runtime crashes, rendering the new pagination feature completely non-functional.

High
General
Avoid hardcoding segment size limit

Avoid hardcoding the segment size limit in the Lua script by passing
max_segment_spans as an argument from the Python code to ensure consistency.

src/sentry/scripts/spans/add-buffer.lua [62-64]

-if span_count > 1000 then
-    redis.call("zpopmin", set_key, span_count - 1000)
+local max_spans = tonumber(ARGV[1])
+if max_spans > 0 and span_count > max_spans then
+    redis.call("zpopmin", set_key, span_count - max_spans)
 end
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out a hardcoded value in the Lua script that should be configurable, improving maintainability by preventing potential inconsistencies with the Python configuration.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants