Skip to content

fix(jobs): handle long log lines and prevent log fetch timeouts#26

Open
eleboucher wants to merge 1 commit intoperfectra1n:mainfrom
eleboucher:fix-hanging-jobs
Open

fix(jobs): handle long log lines and prevent log fetch timeouts#26
eleboucher wants to merge 1 commit intoperfectra1n:mainfrom
eleboucher:fix-hanging-jobs

Conversation

@eleboucher
Copy link

Job pods were sticking around after completion because the controller failed to fetch logs. The "bufio.Scanner: token too long" error occurred when log lines exceeded the default 64KB buffer size. Additionally, log fetching had no timeout, causing the controller to hang indefinitely on large log streams, preventing state transitions.

Solution:

  1. Added MOVER_LOG_MAX_SCAN_BUFFER_SIZE env var (default: 1024KB = 1MB)
    to configure scanner buffer size for handling long log lines
  2. Added MOVER_LOG_FETCH_TIMEOUT env var (default: 120s) to set a
    timeout for log fetching operations
  3. Increased scanner buffer in FilterLogs() based on configuration
  4. Added timeout context in updateMoverStatusForJob() to prevent hanging
  5. Added comprehensive tests for long line handling and configuration

Changes:

  • internal/controller/utils/podlogs.go: Added buffer size and timeout
    configuration, increased scanner buffer, added timeout context
  • internal/controller/utils/podlogs_test.go: Added tests for long log
    lines, buffer size configuration, and timeout configuration

Fix "bufio.Scanner: token too long" errors when parsing pod logs with
lines exceeding 64KB. Also adds timeout for log streaming to prevent
controller from hanging on large log streams.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments