Fix BatchLogRecordProcessor default schedule_delay_millis to 1000ms per OTel spec#4998
Conversation
pmcollins
left a comment
There was a problem hiding this comment.
Thanks for the change. Can you also update the default value in the docstring for OTEL_BLRP_SCHEDULE_DELAY (in environment_variables/__init__.py)?
|
Hi @pmcollins , I've made both the requested changes — updated the CHANGELOG wording and the OTEL_BLRP_SCHEDULE_DELAY docstring default to 1000ms. Please let me know if anything else is needed! |
themavik
left a comment
There was a problem hiding this comment.
Summary: Aligns BatchLogRecordProcessor default schedule_delay_millis from 5000ms to 1000ms per OTel spec.
Done well: Changelog clearly documents the 5x export frequency change for users who rely on defaults. All three touchpoints updated consistently: _DEFAULT_SCHEDULE_DELAY_MILLIS, env var docs, and the three test_args_* assertions (5→1 seconds). The test values correctly reflect seconds internally.
Note: Users not setting OTEL_BLRP_SCHEDULE_DELAY will see more frequent exports; the changelog handles this. Consider whether any performance or rate-limit docs should mention the new default.
|
Thank you for the detailed review @themavik . Agreed on the performance/rate-limit docs — that feels like a separate issue. Happy to open one if that would be helpful! |
…ub.com/Manvi2402/opentelemetry-python into fix/batch-log-processor-schedule-delay
…er OTel spec (open-telemetry#4998) * Fix BatchLogRecordProcessor default schedule_delay_millis to 1000ms per OTel spec * Update OTEL_BLRP_SCHEDULE_DELAY docstring default to 1000ms and fix CHANGELOG * Apply suggestion from @xrmx * Update CHANGELOG wording to 'may be' for export frequency * Apply suggestion from @xrmx --------- Co-authored-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
The SDK's increment_hit_count uses `hit_count > config.max_hits` for the disable check (instrumentation_manager.py:631), so MaxHits=N emits N snapshots and disables on the (N+1)th hit. The DI Flask and FastAPI hit-limit contract tests expected N-1 snapshots, claiming the check was `>=` in both their docstrings and inline comments — neither matched the SDK. The tests were passing on main by accident: BatchLogRecordProcessor's default schedule_delay_millis was 5000ms, so the snapshot emitted on the 3rd hit was still sitting in the batch buffer when the test's 2-second sleep elapsed and read the collector. core 1.41.0 tightened the default to 1000ms (open-telemetry/opentelemetry-python#4998), shrinking the window enough that the 3rd snapshot now lands before the test reads. Fix the tests (and the docstrings on the application's target function) to match SDK behavior: MaxHits=3 -> 3 snapshots emitted, breakpoint disables on the 4th hit. Send a 4th request to actually probe the disabled state, and widen the disabled-state observation window to 5s so any spurious 4th snapshot has plenty of time to flush before we assert the breakpoint stayed disabled.
The SDK's increment_hit_count uses `hit_count > config.max_hits` for the disable check (instrumentation_manager.py:631), so MaxHits=N emits N snapshots and disables on the (N+1)th hit. The DI Flask, FastAPI, and Django hit-limit contract tests expected N-1 snapshots, claiming the check was `>=` in their docstrings and inline comments — neither matched the SDK. The tests were passing on main by accident: BatchLogRecordProcessor's default schedule_delay_millis was 5000ms, so the snapshot emitted on the 3rd hit was still sitting in the batch buffer when the test's 2-second sleep elapsed and read the collector. core 1.41.0 tightened the default to 1000ms (open-telemetry/opentelemetry-python#4998), shrinking the window enough that the 3rd snapshot now lands before the test reads. Fix the tests (and the docstrings on the application's target functions) to match SDK behavior: MaxHits=3 -> 3 snapshots emitted, breakpoint disables on the 4th hit. Send a 4th request to actually probe the disabled state, and widen the disabled-state observation window to 5s so any spurious 4th snapshot has plenty of time to flush before we assert the breakpoint stayed disabled.
) ## Summary The DI Flask `DIFlaskHitLimitTest` and the matching application target function had `MaxHits` semantics flipped relative to the SDK. With `MaxHits=3`, the test expected 2 snapshots, claiming the disable check was `hit_count >= max_hits`. The SDK's [increment_hit_count](https://github.com/aws-observability/aws-otel-python-instrumentation/blob/main/aws-opentelemetry-distro/src/amazon/opentelemetry/distro/debugger/instrumentation_manager.py#L631) actually uses `hit_count > max_hits`, so it emits 3 snapshots and disables on the 4th hit. ## Why this is surfacing now The test was passing on main by accident. `BatchLogRecordProcessor`'s default `schedule_delay_millis` was 5000ms, so the snapshot emitted on the 3rd hit was still sitting in the batch buffer when the test's `time.sleep(2)` elapsed and read the collector. The 3rd snapshot never reached the mock collector within the test's observation window. OpenTelemetry core 1.41.0 [tightened that default to 1000ms](open-telemetry/opentelemetry-python#4998) to comply with the OTel spec, so the upcoming dependency bump to 1.42.1 / 0.63b1 (#762) now sees the 3rd snapshot consistently and the latent off-by-one assertion fails the test. ## Changes - Update the test class docstring and both test methods so `MaxHits=3` → 3 snapshots, breakpoint disabled on the 4th hit. Send a 4th request to actually probe the disabled state. - Update the `limited_function` docstring in the di-flask sample app to match. ## Test plan - [ ] DI Debugger Contract Tests pass on this branch (currently the only test in the suite that was failing, on the `nightly-dependency-updates` branch) - [ ] No SDK behavior change — purely a test/comment alignment with current SDK semantics
…er OTel spec
Description
The
BatchLogRecordProcessor._DEFAULT_SCHEDULE_DELAY_MILLISwas set to5000ms, but the OTel specification defines
OTEL_BLRP_SCHEDULE_DELAYdefault as 1000ms.
Fixes #4991
Type of change
How Has This Been Tested?
Updated existing unit tests in
opentelemetry-sdk/tests/logs/test_export.pyto reflect the corrected default value of 1000ms instead of 5000ms.
Does This PR Require a Contrib Repo Change?
Checklist: