Skip to content

fix: prevent duplicate ECS agent provisioning while JNLP agent is starting, #311, #416#432

Open
mslipets wants to merge 4 commits intojenkinsci:masterfrom
mslipets:fix/jnlp-agent-overprovisioning
Open

fix: prevent duplicate ECS agent provisioning while JNLP agent is starting, #311, #416#432
mslipets wants to merge 4 commits intojenkinsci:masterfrom
mslipets:fix/jnlp-agent-overprovisioning

Conversation

@mslipets
Copy link
Copy Markdown

ECS agents use JNLP (agent-initiated connection), so they are invisible to Jenkins' connectingExecutors metric while the container is starting up.
This creates a gap: once the PlannedNode future resolves (nearly instantly after ProvisioningCallback returns the ECSSlave), the node is removed from pendingLaunches but isn't yet online.
Every subsequent ECSProvisioningStrategy cycle - which fires every 10 seconds - sees excessWorkload > 0 and launches another duplicate ECS task.

The fix accounts for offline ECSSlave nodes matching the requested label as already-pending capacity, subtracting their executors from excessWorkload before the provisioning loop.
Failed agents are cleaned up by ECSLauncher.launch()'s catch block (agent.terminate()), so they don't linger as false pending capacity.

Fixes #311, #416

Testing done

5 automated tests added in ECSProvisioningStrategyTest covering:

Test Scenario
noOfflineECSNodes_provisionsWhenWorkloadExists Baseline: no pending nodes → provisions normally
offlineECSNodeMatchingLabel_suppressesProvisioning Core fix: offline ECS node with matching label suppresses duplicate provision
offlineECSNodeDifferentLabel_provisionsNormally Label filter: wrong-label offline node does not suppress provisioning
offlineECSNodeCoversPartOfWorkload_provisionsRemainder Partial suppression: pending node covers part of queue, remainder is provisioned
multipleOfflineECSNodesCoversWorkload_suppressesProvisioning Multiple pending nodes fully absorb workload

All 43 tests pass on Java 21 (BUILD SUCCESS, 3 pre-existing @Ignore skips in ECSServiceTest).

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

@mslipets mslipets requested a review from a team as a code owner March 11, 2026 17:21
@mslipets
Copy link
Copy Markdown
Author

@Stericson Kindly requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Very often get two Fargate tasks when only one agent is needed

1 participant