Skip to content

chore(ecs): fix failing ECS integration tests#36968

Merged
mergify[bot] merged 6 commits into
aws:mainfrom
aemada-aws:fix/ecs-integ-tests-remediation
Mar 18, 2026
Merged

chore(ecs): fix failing ECS integration tests#36968
mergify[bot] merged 6 commits into
aws:mainfrom
aemada-aws:fix/ecs-integ-tests-remediation

Conversation

@aemada-aws
Copy link
Copy Markdown
Contributor

@aemada-aws aemada-aws commented Feb 12, 2026

Issue # (if applicable)

Reason for this change

10 ECS integration tests were failing due to various issues:

  • Teardown failures from capacity providers / managed instances still in use during stack deletion
  • Cross-stack export deletion ordering when IntegTest is scoped to the stack
  • t2.micro instance type unavailability in certain AZs
  • Container health check targeting wrong port (8000 instead of 80)
  • Missing NLB-to-service security group ingress rule with networkLoadBalancerWithSecurityGroupByDefault feature flag
  • Managed instances tests using incorrect instance profile naming (missing ecsInstanceRole prefix required by AmazonECSInfrastructureRolePolicyForManagedInstances) and overly restrictive instance requirements (NVIDIA GPU + Intel CPU only)
  • EBS volume initialization rate test requiring an external EBS snapshot that does not exist

Description of changes

10 tests fixed across the aws-ecs module:

  1. fargate/integ.capacity-providers — Wrapped in IntegTest with destroy.expectError: true (#19275).

  2. external/integ.daemon-service — Changed IntegTest scope from stack to app to fix cross-stack export deletion ordering. When scoped to the stack, the deploy-assert stack holds a reference to the main stack exports, preventing deletion.

  3. ec2/integ.capacity-provider — Wrapped in IntegTest with destroy.expectError: true (#19275).

  4. ec2/integ.pseudo-terminal — Changed instance type from t2.micro to t3.micro. t2.micro is not available in all AZs, causing ASG launch failures.

  5. fargate/integ.exec-command — Fixed container health check from curl localhost:8000 to curl localhost:80. The amazon/amazon-ecs-sample image serves on port 80; port 8000 always fails, preventing service stabilization.

  6. fargate/integ.enable-execute-command — Same health check port fix (8000 → 80).

  7. fargate/integ.nlb-awsvpc-nw — Added service.connections.allowFrom(lb, ec2.Port.tcp(80)). With the networkLoadBalancerWithSecurityGroupByDefault feature flag, the NLB gets a security group but no ingress rules were created on the service SG, so NLB health checks always failed. Also wrapped in IntegTest.

  8. fargate/integ.ebs-volume-initialization-rate — Replaced external SNAPSHOT_ID env var dependency with an in-stack EBS volume + snapshot created via a NodejsFunction-backed custom resource that waits for snapshot completion before returning.

  9. integ.managedinstances-no-default-capacity-provider — Removed custom IAM roles/instance profile with hardcoded names. The AmazonECSInfrastructureRolePolicyForManagedInstances managed policy requires instance profiles prefixed with ecsInstanceRole; the test used InstanceProfile which does not match. Now lets the construct create defaults with the correct prefix. Removed NVIDIA accelerator and Intel CPU manufacturer constraints. Removed hardcoded regions: ['us-west-2'] since FMI is available in all commercial regions. Added destroy.expectError: true (#36071).

  10. integ.managedinstances-capacity-provider — Same fixes as above. Added destroy.expectError: true (#36071).

Describe any new or updated permissions being added

No new IAM permissions. The NLB fix adds a security group ingress rule (port 80 TCP) to the Fargate service security group to allow traffic from the NLB — required for health checks and traffic routing to function correctly.

Description of how you validated changes

All 10 fixed tests were deployed and validated via integ-runner with --update-on-failed across multiple regions (us-east-1, us-west-2, eu-west-1, eu-central-1, ap-northeast-1):

yarn integ \
  test/aws-ecs/test/fargate/integ.capacity-providers.js \
  test/aws-ecs/test/external/integ.daemon-service.js \
  test/aws-ecs/test/ec2/integ.capacity-provider.js \
  test/aws-ecs/test/ec2/integ.pseudo-terminal.js \
  test/aws-ecs/test/fargate/integ.nlb-awsvpc-nw.js \
  test/aws-ecs/test/fargate/integ.exec-command.js \
  test/aws-ecs/test/fargate/integ.enable-execute-command.js \
  test/aws-ecs/test/fargate/integ.ebs-volume-initialization-rate.js \
  test/aws-ecs/test/integ.managedinstances-no-default-capacity-provider.js \
  test/aws-ecs/test/integ.managedinstances-capacity-provider.js \
  --disable-update-workflow \
  --update-on-failed \
  --force \
  --parallel-regions us-east-1 us-west-2

Destructive changes (expected):

  • integ.exec-command and integ.enable-execute-command: TaskDef replaced (health check change)
  • integ.managedinstances-*: IAM roles/instance profiles replaced (removed hardcoded names, switched to construct defaults)

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

- fargate/capacity-providers: wrap in IntegTest with destroy expectError
- external/daemon-service: scope IntegTest to app (fix cross-stack export)
- ec2/capacity-provider: wrap in IntegTest with destroy expectError
- ec2/pseudo-terminal: t2.micro -> t3.micro (AZ availability)
- fargate/exec-command: fix health check port 8000 -> 80
- fargate/enable-execute-command: fix health check port 8000 -> 80
- fargate/nlb-awsvpc-nw: allow NLB traffic to service SG, wrap in IntegTest
- managedinstances-no-default: use construct defaults for IAM, remove GPU constraint
- managedinstances-capacity-provider: same as above
@aws-cdk-automation aws-cdk-automation requested a review from a team February 12, 2026 07:46
@github-actions github-actions Bot added bug This issue is a bug. p1 labels Feb 12, 2026
@mergify mergify Bot added the contribution/core This is a PR that came from AWS. label Feb 12, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 12, 2026

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
Please try merge from main to avoid findings unrelated to the PR.


TestsPassed ☑️SkippedFailed ❌️
Security Guardian Results264 ran262 passed2 failed
TestResult
Security Guardian Results
packages/@aws-cdk-testing/framework-integ/test/aws-ecs/test/fargate/integ.ebs-volume-initialization-rate.js.snapshot/integ-aws-ecs-ebs-volume-initialization-rate.template.json
ec2-ebs-encryption-enabled.guard❌ failure
packages/@aws-cdk-testing/framework-integ/test/aws-ecs/test/integ.managedinstances-capacity-provider.js.snapshot/integ-managedinstances-capacity-provider.template.json
ec2-no-open-security-groups.guard❌ failure

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 12, 2026

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
Please try merge from main to avoid findings unrelated to the PR.


TestsPassed ☑️SkippedFailed ❌️
Security Guardian Results with resolved templates264 ran263 passed1 failed
TestResult
Security Guardian Results with resolved templates
packages/@aws-cdk-testing/framework-integ/test/aws-ecs/test/fargate/integ.ebs-volume-initialization-rate.js.snapshot/integ-aws-ecs-ebs-volume-initialization-rate.template.json
ec2-ebs-encryption-enabled.guard❌ failure

@aemada-aws aemada-aws added p2 and removed p1 labels Feb 12, 2026
@aemada-aws aemada-aws changed the title fix(ecs): remediate 9 failing ECS integration tests chore(ecs): fix failing ECS integration tests Feb 12, 2026
@aemada-aws aemada-aws removed the bug This issue is a bug. label Feb 12, 2026
…ia Lambda CR

Instead of requiring an external EBS snapshot (SNAPSHOT_ID env var),
create a volume and snapshot within the stack using a Lambda-backed
custom resource that waits for snapshot completion.
- ebs-volume-initialization-rate: NodejsFunction TS asset instead of inline Lambda
- capacity-providers: reference aws#19275 instead of aws#36071
- managedinstances: remove unnecessary comments
@github-actions github-actions Bot added bug This issue is a bug. p1 and removed p2 labels Feb 12, 2026
@aemada-aws aemada-aws marked this pull request as ready for review February 16, 2026 17:06
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Feb 16, 2026
@aemada-aws aemada-aws marked this pull request as draft February 17, 2026 16:38
@aemada-aws aemada-aws marked this pull request as ready for review February 17, 2026 16:38
@aemada-aws aemada-aws added p2 and removed bug This issue is a bug. p1 labels Feb 19, 2026
@kumvprat kumvprat self-assigned this Mar 2, 2026
@kumvprat kumvprat added the pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. label Mar 16, 2026
@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Mar 16, 2026
@aemada-aws aemada-aws removed the pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. label Mar 18, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 18, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 18, 2026

Merge Queue Status

  • Entered queue2026-03-18 09:42 UTC · Rule: default-squash
  • Checks passed · in-place
  • Merged2026-03-18 10:24 UTC · at 69a35255c121069f9540d7ea90b69b6b273df9b2

This pull request spent 42 minutes 30 seconds in the queue, including 42 minutes 17 seconds running CI.

Required conditions to merge

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 18, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify Bot merged commit 6cf125f into aws:main Mar 18, 2026
20 of 24 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Mar 18, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

contribution/core This is a PR that came from AWS. p2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants