Skip to content

fix(eks): add IRSA role for CloudWatch Agent and support custom IAM policies#341

Merged
unni-facets merged 2 commits intomainfrom
fix/eks-cloudwatch-agent-irsa
May 8, 2026
Merged

fix(eks): add IRSA role for CloudWatch Agent and support custom IAM policies#341
unni-facets merged 2 commits intomainfrom
fix/eks-cloudwatch-agent-irsa

Conversation

@rauneetverma
Copy link
Copy Markdown

Summary

  • Adds a dedicated IRSA role for the CloudWatch Agent (amazon-cloudwatch:cloudwatch-agent service account) when Container Insights is enabled
  • Passes the IRSA role ARN to the amazon-cloudwatch-observability addon via service_account_role_arn
  • Adds a configurable cloudwatch_agent_policies spec field (using patternProperties) so users can attach additional IAM policies to the CloudWatch Agent IRSA role
  • Retains the existing node-level CloudWatchAgentServerPolicy attachment on managed node groups to avoid breaking existing customers

Problem

The amazon-cloudwatch-observability EKS addon had service_account_role_arn = null, causing the CloudWatch Agent to fall back to the node IAM role for all AWS API calls. This caused two issues:

  1. Karpenter nodes: CloudWatch Agent pods on Karpenter-provisioned nodes had no CloudWatch/X-Ray permissions since only managed node group roles had CloudWatchAgentServerPolicy attached
  2. X-Ray traces: The addon injects OTEL auto-instrumentation into all pods via a mutating webhook (mpod.kb.io). Traces are sent to the CloudWatch Agent which forwards to X-Ray, but AWSXRayDaemonWriteAccess was never attached to any role, causing:
    OTLPExporterError: Internal Server Error
    AccessDeniedException: xray:PutTraceSegments
    

Solution

Create an IRSA role for the CloudWatch Agent (same pattern as the existing EBS CSI driver IRSA), so it gets permissions via the service account regardless of which node type it runs on.

Architecture:

                    Before                                    After
CloudWatch Agent → node IAM role → ❌ (no X-Ray)   CloudWatch Agent → IRSA role → ✅ (CW + X-Ray + custom)
                   (only on managed nodes)                            (works on all nodes)

Usage - to fix X-Ray errors, add to blueprint:

spec:
  container_insights_enabled: true
  cloudwatch_agent_policies:
    xray:
      arn: "arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess"

Files Changed

File Change
main.tf Added IRSA role for amazon-cloudwatch:cloudwatch-agent SA, policy attachments, and wired service_account_role_arn to addon config. Kept existing node-level policy attachment unchanged.
variables.tf Added cloudwatch_agent_policies as optional(map(object({ arn = string })), {})
facets.yaml Added cloudwatch_agent_policies with patternProperties and ARN validation, visible when container_insights_enabled is true

Backward Compatibility

  • Existing cloudwatch_agent resource (node-level policy attachment) is untouched — same resource name, same logic
  • New IRSA resources use cloudwatch_agent_irsa suffix to avoid state conflicts
  • cloudwatch_agent_policies defaults to {}, so existing customers with no custom policies are unaffected

Test plan

  • Validate module with raptor create iac-module -f modules/kubernetes_cluster/eks_standard/1.0 --dry-run
  • Deploy on a cluster with container_insights_enabled: true and verify CloudWatch Agent uses the IRSA role
  • Add AWSXRayDaemonWriteAccess via cloudwatch_agent_policies and verify OTEL trace export to X-Ray works on both managed and Karpenter nodes
  • Deploy without cloudwatch_agent_policies to confirm backward compatibility
  • Verify existing managed node groups still have CloudWatchAgentServerPolicy on their node roles

🤖 Generated with Claude Code

…olicies

Problem:
The amazon-cloudwatch-observability addon had no IRSA role
(service_account_role_arn was null), so the CloudWatch Agent fell back to
the node IAM role for AWS API calls. This caused failures on
Karpenter-provisioned nodes where the node role lacks CloudWatch/X-Ray
permissions. The OTEL auto-instrumentation injected by the addon's
mutating webhook would fail with:
  AccessDeniedException: xray:PutTraceSegments

Changes:
- Create a dedicated IRSA role for the CloudWatch Agent service account
  (amazon-cloudwatch:cloudwatch-agent) with CloudWatchAgentServerPolicy
- Pass the IRSA role ARN to the addon via service_account_role_arn
- Add configurable cloudwatch_agent_policies spec field so users can
  attach additional IAM policies (e.g., AWSXRayDaemonWriteAccess)
- Retain the existing node-level CloudWatchAgentServerPolicy attachment
  on managed node groups to avoid breaking existing customers

With IRSA, the CloudWatch Agent gets permissions via its service account
regardless of which node it runs on (managed node group or Karpenter),
following the same pattern used for the EBS CSI driver.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rauneetverma rauneetverma requested a review from unni-facets May 7, 2026 12:00
…policies ARN field

Enables dropdown selection of IAM policy ARNs from blueprint outputs
while still allowing manual entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unni-facets unni-facets merged commit a822710 into main May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants