Skip to content

Fix BatchLogRecordProcessor default schedule_delay_millis to 1000ms per OTel spec#4998

Merged
xrmx merged 8 commits into
open-telemetry:mainfrom
Manvi2402:fix/batch-log-processor-schedule-delay
Apr 3, 2026
Merged

Fix BatchLogRecordProcessor default schedule_delay_millis to 1000ms per OTel spec#4998
xrmx merged 8 commits into
open-telemetry:mainfrom
Manvi2402:fix/batch-log-processor-schedule-delay

Conversation

@Manvi2402

Copy link
Copy Markdown
Contributor

…er OTel spec

Description

The BatchLogRecordProcessor._DEFAULT_SCHEDULE_DELAY_MILLIS was set to
5000ms, but the OTel specification defines OTEL_BLRP_SCHEDULE_DELAY
default as 1000ms.

Fixes #4991

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Updated existing unit tests in opentelemetry-sdk/tests/logs/test_export.py
to reflect the corrected default value of 1000ms instead of 5000ms.

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • [ x ] No.

Checklist:

  • [ x ] Followed the style guidelines of this project
  • [ x ] Changelogs have been updated
  • [ x ] Unit tests have been added
  • [ x ] Documentation has been updated

@pmcollins pmcollins left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. Can you also update the default value in the docstring for OTEL_BLRP_SCHEDULE_DELAY (in environment_variables/__init__.py)?

Comment thread CHANGELOG.md Outdated
@Manvi2402

Copy link
Copy Markdown
Contributor Author

Hi @pmcollins , I've made both the requested changes — updated the CHANGELOG wording and the OTEL_BLRP_SCHEDULE_DELAY docstring default to 1000ms. Please let me know if anything else is needed!

@themavik themavik left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: Aligns BatchLogRecordProcessor default schedule_delay_millis from 5000ms to 1000ms per OTel spec.

Done well: Changelog clearly documents the 5x export frequency change for users who rely on defaults. All three touchpoints updated consistently: _DEFAULT_SCHEDULE_DELAY_MILLIS, env var docs, and the three test_args_* assertions (5→1 seconds). The test values correctly reflect seconds internally.

Note: Users not setting OTEL_BLRP_SCHEDULE_DELAY will see more frequent exports; the changelog handles this. Consider whether any performance or rate-limit docs should mention the new default.

@Manvi2402

Copy link
Copy Markdown
Contributor Author

Thank you for the detailed review @themavik . Agreed on the performance/rate-limit docs — that feels like a separate issue. Happy to open one if that would be helpful!

Comment thread CHANGELOG.md Outdated
@xrmx xrmx moved this to Approved PRs in Python PR digest Mar 20, 2026
Comment thread CHANGELOG.md Outdated
@xrmx xrmx requested a review from DylanRussell April 1, 2026 15:31
Comment thread CHANGELOG.md Outdated
@xrmx xrmx enabled auto-merge (squash) April 3, 2026 08:28
@xrmx xrmx merged commit d9a8c70 into open-telemetry:main Apr 3, 2026
508 checks passed
@github-project-automation github-project-automation Bot moved this from Approved PRs to Done in Python PR digest Apr 3, 2026
Vitexus pushed a commit to Vitexus/opentelemetry-python that referenced this pull request Apr 30, 2026
…er OTel spec (open-telemetry#4998)

* Fix BatchLogRecordProcessor default schedule_delay_millis to 1000ms per OTel spec

* Update OTEL_BLRP_SCHEDULE_DELAY docstring default to 1000ms and fix CHANGELOG

* Apply suggestion from @xrmx

* Update CHANGELOG wording to 'may be' for export frequency

* Apply suggestion from @xrmx

---------

Co-authored-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
liustve added a commit to liustve/aws-otel-python-instrumentation that referenced this pull request Jun 16, 2026
The SDK's increment_hit_count uses `hit_count > config.max_hits` for the
disable check (instrumentation_manager.py:631), so MaxHits=N emits N
snapshots and disables on the (N+1)th hit. The DI Flask and FastAPI
hit-limit contract tests expected N-1 snapshots, claiming the check was
`>=` in both their docstrings and inline comments — neither matched the SDK.

The tests were passing on main by accident: BatchLogRecordProcessor's
default schedule_delay_millis was 5000ms, so the snapshot emitted on the
3rd hit was still sitting in the batch buffer when the test's 2-second
sleep elapsed and read the collector. core 1.41.0 tightened the default
to 1000ms (open-telemetry/opentelemetry-python#4998), shrinking the
window enough that the 3rd snapshot now lands before the test reads.

Fix the tests (and the docstrings on the application's target function)
to match SDK behavior: MaxHits=3 -> 3 snapshots emitted, breakpoint
disables on the 4th hit. Send a 4th request to actually probe the
disabled state, and widen the disabled-state observation window to 5s
so any spurious 4th snapshot has plenty of time to flush before we
assert the breakpoint stayed disabled.
liustve added a commit to liustve/aws-otel-python-instrumentation that referenced this pull request Jun 16, 2026
The SDK's increment_hit_count uses `hit_count > config.max_hits` for the
disable check (instrumentation_manager.py:631), so MaxHits=N emits N
snapshots and disables on the (N+1)th hit. The DI Flask, FastAPI, and
Django hit-limit contract tests expected N-1 snapshots, claiming the
check was `>=` in their docstrings and inline comments — neither matched
the SDK.

The tests were passing on main by accident: BatchLogRecordProcessor's
default schedule_delay_millis was 5000ms, so the snapshot emitted on the
3rd hit was still sitting in the batch buffer when the test's 2-second
sleep elapsed and read the collector. core 1.41.0 tightened the default
to 1000ms (open-telemetry/opentelemetry-python#4998), shrinking the
window enough that the 3rd snapshot now lands before the test reads.

Fix the tests (and the docstrings on the application's target functions)
to match SDK behavior: MaxHits=3 -> 3 snapshots emitted, breakpoint
disables on the 4th hit. Send a 4th request to actually probe the
disabled state, and widen the disabled-state observation window to 5s
so any spurious 4th snapshot has plenty of time to flush before we
assert the breakpoint stayed disabled.
liustve added a commit to aws-observability/aws-otel-python-instrumentation that referenced this pull request Jun 17, 2026
)

## Summary

The DI Flask `DIFlaskHitLimitTest` and the matching application target
function had `MaxHits` semantics flipped relative to the SDK. With
`MaxHits=3`, the test expected 2 snapshots, claiming the disable check
was `hit_count >= max_hits`. The SDK's
[increment_hit_count](https://github.com/aws-observability/aws-otel-python-instrumentation/blob/main/aws-opentelemetry-distro/src/amazon/opentelemetry/distro/debugger/instrumentation_manager.py#L631)
actually uses `hit_count > max_hits`, so it emits 3 snapshots and
disables on the 4th hit.

## Why this is surfacing now

The test was passing on main by accident. `BatchLogRecordProcessor`'s
default `schedule_delay_millis` was 5000ms, so the snapshot emitted on
the 3rd hit was still sitting in the batch buffer when the test's
`time.sleep(2)` elapsed and read the collector. The 3rd snapshot never
reached the mock collector within the test's observation window.

OpenTelemetry core 1.41.0 [tightened that default to
1000ms](open-telemetry/opentelemetry-python#4998)
to comply with the OTel spec, so the upcoming dependency bump to 1.42.1
/ 0.63b1 (#762) now sees the 3rd snapshot consistently and the latent
off-by-one assertion fails the test.

## Changes

- Update the test class docstring and both test methods so `MaxHits=3` →
3 snapshots, breakpoint disabled on the 4th hit. Send a 4th request to
actually probe the disabled state.
- Update the `limited_function` docstring in the di-flask sample app to
match.

## Test plan

- [ ] DI Debugger Contract Tests pass on this branch (currently the only
test in the suite that was failing, on the `nightly-dependency-updates`
branch)
- [ ] No SDK behavior change — purely a test/comment alignment with
current SDK semantics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

BatchLogRecordProcessor default schedule_delay is 5000ms but spec requires 1000ms

6 participants