Skip to content

add more debugging logs when emr_eks system tests fail#64817

Open
ferruzzi wants to merge 2 commits intoapache:mainfrom
aws-mwaa:ferruzzi/system-tests/emr-eks-describe
Open

add more debugging logs when emr_eks system tests fail#64817
ferruzzi wants to merge 2 commits intoapache:mainfrom
aws-mwaa:ferruzzi/system-tests/emr-eks-describe

Conversation

@ferruzzi
Copy link
Copy Markdown
Contributor

@ferruzzi ferruzzi commented Apr 6, 2026

This system tests has been failing intermittently and the log outputs are not very helpful. Adding a new step that will print cluster details when (and only when) one of the steps fails.

Example output from the new task is pretty verbose, but it beats not having anything to work from:

2026-04-06T22:45:22.450876Z [info     ] [DAG TEST] running task <TaskInstance: example_emr_eks.describe_emr_eks_pods manual__2026-04-06T22:27:40.072272+00:00 [scheduled]> [airflow.sdk.definitions.dag]
2026-04-06T22:45:24.339500Z [info     ] Task started                   [airflow.api_fastapi.execution_api.routes.task_instances] correlation_id=019d64f8-d5e1-7222-978e-d16b58279830 hostname=1ce4ce1f8e01 previous_state=queued ti_id=019d64e8-a06c-7c03-a074-264da49fd0b0
2026-04-06T22:45:24.341064Z [info     ] Task instance state updated    [airflow.api_fastapi.execution_api.routes.task_instances] correlation_id=019d64f8-d5e1-7222-978e-d16b58279830 rows_affected=1 ti_id=019d64e8-a06c-7c03-a074-264da49fd0b0
2026-04-06T22:45:24.407726Z [info     ] Updating RenderedTaskInstanceFields [airflow.api_fastapi.execution_api.routes.task_instances] correlation_id=019d64f8-dd31-7e48-9808-b9d9e55a018f field_count=3 ti_id=019d64e8-a06c-7c03-a074-264da49fd0b0
Task instance is in running state
 Previous state of the Task instance: TaskInstanceState.QUEUED
Current task name:describe_emr_eks_pods
Dag name:example_emr_eks
Added new context arn:aws:eks:us-east-1:324969868898:cluster/env2a1d4f85-cluster to /files/.kube/config
***** pods in namespace default *****
NAME                                     READY   STATUS      RESTARTS   AGE     IP           NODE                         NOMINATED NODE   READINESS GATES
000000037b4uegnu8q5-vwtjf                2/2     Running     0          5m51s   10.0.1.66    ip-10-0-1-190.ec2.internal   <none>           <none>
spark-000000037b4uegnu8q5-driver-k66f6   0/2     Completed   0          3m31s   10.0.0.168   ip-10-0-0-82.ec2.internal    <none>           <none>
***** pod descriptions *****
Name:             000000037b4uegnu8q5-vwtjf
Namespace:        default
Priority:         0
Service Account:  emr-containers-sa-spark-client-324969868898-aegalcp3g62jtr5sqzohuy77o567t0udifprjbotp9enlc2q
Node:             ip-10-0-1-190.ec2.internal/10.0.1.190
Start Time:       Mon, 06 Apr 2026 22:39:43 +0000
Labels:           batch.kubernetes.io/controller-uid=86b164e1-b30d-4800-b33b-39c75c646e84
                  batch.kubernetes.io/job-name=000000037b4uegnu8q5
                  controller-uid=86b164e1-b30d-4800-b33b-39c75c646e84
                  emr-containers.amazonaws.com/component=job.submitter
                  emr-containers.amazonaws.com/job.id=000000037b4uegnu8q5
                  emr-containers.amazonaws.com/job.release.label=emr-7.0.0-latest
                  emr-containers.amazonaws.com/resource.type=job.run
                  emr-containers.amazonaws.com/virtual-cluster-id=ucm2plcw0b7q1xtf0z9vdxxya
                  job-name=000000037b4uegnu8q5
                  topology.kubernetes.io/region=us-east-1
                  topology.kubernetes.io/zone=us-east-1b
Annotations:      <none>
Status:           Running
IP:               10.0.1.66
IPs:
  IP:           10.0.1.66
Controlled By:  Job/000000037b4uegnu8q5
Containers:
  job-runner:
    Container ID:  containerd://3a72f4a90ae999cd9b1ce2b0b8a6cb90a5f40b32dd25025fd9beaaa7a55f8120
    Image:         755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0:latest
    Image ID:      755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0@sha256:8d0a168622f4127d34be07532b7fe07918bc7e1853320c76b238d59b31209033
    Port:          <none>
    Host Port:     <none>
    Args:
      job
      --master
      k8s://kubernetes.default.svc
      --deploy-mode
      cluster
      --name
      spark-000000037b4uegnu8q5
      --conf
      spark.kubernetes.container.image.pullPolicy=Always
      --conf
      spark.kubernetes.report.interval=10000
      --conf
      spark.kubernetes.client.informer.reSyncInterval=300000
      --conf
      spark.kubernetes.driver.label.emr-containers.amazonaws.com/job.id=000000037b4uegnu8q5
      --conf
      spark.kubernetes.driver.maxAttempts=5
      --conf
      spark.kubernetes.driver.label.emr-containers.amazonaws.com/resource.type=job.run
      --conf
      spark.kubernetes.client.informer.enabled=true
      --conf
      spark.kubernetes.client.informer.internalPodMetadataWritePath=/var/log/spark/pods/pod-metadata
      --conf
      spark.kubernetes.driver.label.emr-containers.amazonaws.com/component=driver
      --conf
      spark.executorEnv.AWS_REGION=us-east-1
      --conf
      spark.app.id=000000037b4uegnu8q5
      --conf
      spark.kubernetes.namespace=default
      --conf
      spark.kubernetes.driver.label.emr-containers.amazonaws.com/virtual-cluster-id=ucm2plcw0b7q1xtf0z9vdxxya
      --conf
      spark.kubernetes.authenticate.executor.serviceAccountName=emr-containers-sa-spark-executor-324969868898-aegalcp3g62jtr5sqzohuy77o567t0udifprjbotp9enlc2q
      --conf
      spark.kubernetes.container.image=755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0:latest
      --conf
      spark.kubernetes.executor.label.emr-containers.amazonaws.com/resource.type=job.run
      --conf
      spark.kubernetes.executor.label.emr-containers.amazonaws.com/job.id=000000037b4uegnu8q5
      --conf
      spark.kubernetes.authenticate.driver.serviceAccountName=emr-containers-sa-spark-driver-324969868898-aegalcp3g62jtr5sqzohuy77o567t0udifprjbotp9enlc2q
      --conf
      spark.kubernetes.driver.pod.name=spark-000000037b4uegnu8q5-driver
      --conf
      spark.kubernetes.executor.label.emr-containers.amazonaws.com/component=executor
      --conf
      spark.kubernetes.executor.label.emr-containers.amazonaws.com/virtual-cluster-id=ucm2plcw0b7q1xtf0z9vdxxya
      --conf
      spark.kubernetes.driverEnv.AWS_REGION=us-east-1
      --conf
      spark.executors.instances=2
      --conf
      spark.executors.memory=2G
      --conf
      spark.executor.cores=2
      --conf
      spark.driver.cores=1
      s3://env2a1d4f85-bucket/pi.py
    State:          Running
      Started:      Mon, 06 Apr 2026 22:41:42 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     800m
      memory:  512Mi
    Environment:
      SPARK_CONTAINER_ID:               000000037b4uegnu8q5-vwtjf (v1:metadata.name)
      CONTAINER_LOG_ROTATION:           DISABLED
      K8S_SPARK_LOG_ERROR_REGEX:        (Error|Exception|Fail)
      AWS_REGION:                       us-east-1
      K8S_SPARK_LOG_URL_STDOUT:         /var/log/spark/user/$(SPARK_CONTAINER_ID)/stdout
      STS_ENDPOINT_URL:                 https://sts.us-east-1.amazonaws.com
      VALIDATION_SCRIPT_PATH:           /usr/bin/pre-validate.py
      JAVA_TOOL_OPTIONS:                -Xmx820m
      SIDECAR_SIGNAL_FILE:              /var/log/fluentd/main-container-terminated
      CW_LOG_GROUP_NAME:                /emr-eks-jobs
      JOB_METADATA_CW_LOG_STREAM_NAME:  airflow/ucm2plcw0b7q1xtf0z9vdxxya/jobs/000000037b4uegnu8q5/job-metadata.log
      K8S_SPARK_LOG_URL_STDERR:         /var/log/spark/user/$(SPARK_CONTAINER_ID)/stderr
      TERMINATION_ERROR_LOG_FILE_PATH:  /var/log/spark/error.log
      AWS_STS_REGIONAL_ENDPOINTS:       regional
      AWS_ROLE_ARN:                     arn:aws:iam::324969868898:role/RoleSysTest_example_emr_eks_job
      AWS_WEB_IDENTITY_TOKEN_FILE:      /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /etc/spark/conf/driver-internal-pod.yaml from podtemplate-000000037b4uegnu8q5 (rw,path="driver")
      /etc/spark/conf/driver-pod-template-container-allowlist.txt from podtemplate-000000037b4uegnu8q5 (rw,path="driver-container-allowlist")
      /etc/spark/conf/driver-pod-template-pod-allowlist.txt from podtemplate-000000037b4uegnu8q5 (rw,path="driver-pod-allowlist")
      /etc/spark/conf/executor-internal-pod.yaml from podtemplate-000000037b4uegnu8q5 (rw,path="executor")
      /etc/spark/conf/executor-pod-template-container-allowlist.txt from podtemplate-000000037b4uegnu8q5 (rw,path="executor-container-allowlist")
      /etc/spark/conf/executor-pod-template-pod-allowlist.txt from podtemplate-000000037b4uegnu8q5 (rw,path="executor-pod-allowlist")
      /home/hadoop from home-dir (rw)
      /mnt from mnt-dir (rw)
      /tmp from temp-data-dir (rw)
      /usr/bin/pre-validate.py from podtemplate-000000037b4uegnu8q5 (rw,path="validation-script")
      /usr/lib/spark/conf/spark-defaults.conf from 000000037b4uegnu8q5-spark-defaults (rw,path="spark-defaults.conf")
      /var/log/fluentd from fluentd-dir (rw)
      /var/log/spark from emr-container-log-dir (rw)
      /var/log/spark/apps from emr-container-spark-apps-log-dir (rw)
      /var/log/spark/pods from emr-container-spark-pods-log-dir (rw)
      /var/log/spark/user from emr-container-spark-user-log-dir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6jblc (ro)
  emr-container-fluentd:
    Container ID:   containerd://0b01ff7f7fba536aff747b420863034dd6bbb0938c75ede1764d4ce7d9260477
    Image:          755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0:latest
    Image ID:       755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0@sha256:d94ad0dfec137dd3f3a47fa86c5d1bcfb74c9c958e73932f800303f8be500e93
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 06 Apr 2026 22:42:34 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  512Mi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment:
      SPARK_CONTAINER_ID:           000000037b4uegnu8q5-vwtjf (v1:metadata.name)
      K8S_SPARK_LOG_URL_STDOUT:     /var/log/spark/user/$(SPARK_CONTAINER_ID)/stdout
      AWS_REGION:                   us-east-1
      K8S_SPARK_LOG_URL_STDERR:     /var/log/spark/user/$(SPARK_CONTAINER_ID)/stderr
      FLUENTD_CONF:                 fluent.conf
      SIDECAR_SIGNAL_FILE:          /var/log/fluentd/main-container-terminated
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_ROLE_ARN:                 arn:aws:iam::324969868898:role/RoleSysTest_example_emr_eks_job
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /etc/fluent/fluent.conf from config-volume (rw,path="job")
      /home/hadoop from home-dir (rw)
      /tmp from temp-data-dir (rw)
      /var/log/fluentd from fluentd-dir (rw)
      /var/log/spark from emr-container-log-dir (rw)
      /var/log/spark/apps from emr-container-spark-apps-log-dir (rw)
      /var/log/spark/pods from emr-container-spark-pods-log-dir (rw)
      /var/log/spark/user from emr-container-spark-user-log-dir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6jblc (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  000000037b4uegnu8q5-spark-defaults:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      000000037b4uegnu8q5-spark-defaults
    Optional:  false
  podtemplate-000000037b4uegnu8q5:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      podtemplate-000000037b4uegnu8q5
    Optional:  false
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      fluentd-ucm2plcw0b7q1xtf0z9vdxxya-000000037b4uegnu8q5
    Optional:  false
  temp-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  home-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  mnt-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  fluentd-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  emr-container-log-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  emr-container-spark-apps-log-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  emr-container-spark-pods-log-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  emr-container-spark-user-log-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-6jblc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  5m56s  default-scheduler  Successfully assigned default/000000037b4uegnu8q5-vwtjf to ip-10-0-1-190.ec2.internal
  Normal  Pulling    5m56s  kubelet            Pulling image "755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0:latest"
  Normal  Pulled     3m57s  kubelet            Successfully pulled image "755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0:latest" in 1m58.401s (1m58.401s including waiting). Image size: 3184975025 bytes.
  Normal  Created    3m57s  kubelet            Container created
  Normal  Started    3m57s  kubelet            Container started
  Normal  Pulling    3m57s  kubelet            Pulling image "755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0:latest"
  Normal  Pulled     3m5s   kubelet            Successfully pulled image "755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0:latest" in 52.032s (52.032s including waiting). Image size: 773303577 bytes.
  Normal  Created    3m5s   kubelet            Container created
  Normal  Started    3m5s   kubelet            Container started


Name:             spark-000000037b4uegnu8q5-driver-k66f6
Namespace:        default
Priority:         0
Service Account:  emr-containers-sa-spark-driver-324969868898-aegalcp3g62jtr5sqzohuy77o567t0udifprjbotp9enlc2q
Node:             ip-10-0-0-82.ec2.internal/10.0.0.82
Start Time:       Mon, 06 Apr 2026 22:42:03 +0000
Labels:           batch.kubernetes.io/controller-uid=386e6d66-6589-4da3-962a-a1178f4f5017
                  batch.kubernetes.io/job-name=spark-000000037b4uegnu8q5-driver
                  controller-uid=386e6d66-6589-4da3-962a-a1178f4f5017
                  eks-subscription.amazonaws.com/emr.internal.id=a8714b1e-a951-4b1d-85ae-01997571cebc
                  emr-containers.amazonaws.com/component=driver
                  emr-containers.amazonaws.com/job.id=000000037b4uegnu8q5
                  emr-containers.amazonaws.com/resource.type=job.run
                  emr-containers.amazonaws.com/virtual-cluster-id=ucm2plcw0b7q1xtf0z9vdxxya
                  job-name=spark-000000037b4uegnu8q5-driver
                  spark-app-name=spark-000000037b4uegnu8q5
                  spark-app-selector=spark-000000037b4uegnu8q5
                  spark-role=driver
                  spark-version=3.5.0-amzn-0
                  topology.kubernetes.io/region=us-east-1
                  topology.kubernetes.io/zone=us-east-1a
Annotations:      <none>
Status:           Succeeded
IP:               10.0.0.168
IPs:
  IP:           10.0.0.168
Controlled By:  Job/spark-000000037b4uegnu8q5-driver
Containers:
  emr-container-fluentd:
    Container ID:   containerd://79be5d58548fa075ee7a448af76b9238ea40bce81994d9228c6b7101de0dbc84
    Image:          755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0:latest
    Image ID:       755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0@sha256:d94ad0dfec137dd3f3a47fa86c5d1bcfb74c9c958e73932f800303f8be500e93
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 06 Apr 2026 22:42:46 +0000
      Finished:     Mon, 06 Apr 2026 22:45:07 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  512Mi
    Requests:
      memory:  512Mi
    Environment:
      SPARK_CONTAINER_ID:                   spark-000000037b4uegnu8q5-driver-k66f6 (v1:metadata.name)
      SPARK_ROLE:                            (v1:metadata.labels['spark-role'])
      K8S_SPARK_LOG_URL_STDERR:             /var/log/spark/user/$(SPARK_CONTAINER_ID)/stderr
      K8S_SPARK_LOG_URL_STDOUT:             /var/log/spark/user/$(SPARK_CONTAINER_ID)/stdout
      K8S_SPARK_LOG_URL_STDOUT_ROT:         /var/log/spark/user/$(SPARK_CONTAINER_ID)/stdout/*
      K8S_SPARK_LOG_URL_STDERR_ROT:         /var/log/spark/user/$(SPARK_CONTAINER_ID)/stderr/*
      SIDECAR_SIGNAL_FILE:                  /var/log/fluentd/main-container-terminated
      FLUENTD_CONF:                         fluent.conf
      K8S_SPARK_EVENT_LOG_DIR:              /var/log/spark/apps
      AWS_REGION:                           us-east-1
      RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR:  1.2
      SPARK_APPLICATION_ID:                 spark-000000037b4uegnu8q5
      AWS_STS_REGIONAL_ENDPOINTS:           regional
      AWS_ROLE_ARN:                         arn:aws:iam::324969868898:role/RoleSysTest_example_emr_eks_job
      AWS_WEB_IDENTITY_TOKEN_FILE:          /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /etc/fluent/fluent.conf from config-volume (rw,path="driver")
      /home/hadoop from home-dir (rw)
      /tmp from temp-data-dir (rw)
      /var/emr-container/s3 from emr-container-s3 (ro)
      /var/log/fluentd from emr-container-communicate (rw)
      /var/log/spark/apps from emr-container-event-log-dir (rw)
      /var/log/spark/user from emr-container-application-log-dir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwjbd (ro)
  spark-kubernetes-driver:
    Container ID:  containerd://11f79364660fb3c0cafe8a0b48af108174782e55d030e043b39b99da71bda125
    Image:         755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0:latest
    Image ID:      755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0@sha256:8d0a168622f4127d34be07532b7fe07918bc7e1853320c76b238d59b31209033
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /usr/lib/spark/conf/spark.properties
      --class
      org.apache.spark.deploy.PythonRunner
      s3://env2a1d4f85-bucket/pi.py
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 06 Apr 2026 22:44:42 +0000
      Finished:     Mon, 06 Apr 2026 22:44:52 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  1433Mi
    Requests:
      cpu:     1
      memory:  1433Mi
    Environment:
      SPARK_USER:                       hadoop
      SPARK_DRIVER_POD_NAME:            spark-000000037b4uegnu8q5-driver-k66f6 (v1:metadata.name)
      SPARK_APPLICATION_ID:             spark-000000037b4uegnu8q5
      AWS_REGION:                       us-east-1
      SPARK_DRIVER_BIND_ADDRESS:         (v1:status.podIP)
      PYSPARK_PYTHON:                   /usr/bin/python3
      PYSPARK_DRIVER_PYTHON:            /usr/bin/python3
      HADOOP_CONF_DIR:                  /etc/hadoop/conf
      SPARK_LOCAL_DIRS:                 /var/data/spark-3bf31ccc-1c8c-448e-999e-e08148b8ef0b
      SPARK_CONTAINER_ID:               spark-000000037b4uegnu8q5-driver-k66f6 (v1:metadata.name)
      SIDECAR_SIGNAL_FILE:              /var/log/fluentd/main-container-terminated
      CONTAINER_LOG_ROTATION:           DISABLED
      K8S_SPARK_LOG_ERROR_REGEX:        (Error|Exception|Fail)
      K8S_SPARK_LOG_URL_STDOUT:         /var/log/spark/user/$(SPARK_CONTAINER_ID)/stdout
      K8S_SPARK_LOG_URL_STDERR:         /var/log/spark/user/$(SPARK_CONTAINER_ID)/stderr
      TERMINATION_ERROR_LOG_FILE_PATH:  /var/log/spark/error.log
      K8S_SPARK_LOG_URL_STDOUT_ROT:     /var/log/spark/user/$(SPARK_CONTAINER_ID)/stdout/*
      K8S_SPARK_LOG_URL_STDERR_ROT:     /var/log/spark/user/$(SPARK_CONTAINER_ID)/stderr/*
      SPARK_CONF_DIR:                   /usr/lib/spark/conf
      AWS_STS_REGIONAL_ENDPOINTS:       regional
      AWS_ROLE_ARN:                     arn:aws:iam::324969868898:role/RoleSysTest_example_emr_eks_job
      AWS_WEB_IDENTITY_TOKEN_FILE:      /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /etc/hadoop/conf from hadoop-properties (rw)
      /home/hadoop from home-dir (rw)
      /mnt from mnt-dir (rw)
      /opt/spark/pod-template from pod-template-volume (rw)
      /tmp from temp-data-dir (rw)
      /usr/lib/spark/conf/spark-defaults.conf from 000000037b4uegnu8q5-spark-defaults (rw,path="spark-defaults.conf")
      /usr/lib/spark/conf/spark.properties from spark-conf-volume-driver (rw,path="spark.properties")
      /var/data/spark-3bf31ccc-1c8c-448e-999e-e08148b8ef0b from spark-local-dir-1 (rw)
      /var/log/fluentd from emr-container-communicate (rw)
      /var/log/spark/apps from emr-container-event-log-dir (rw)
      /var/log/spark/user from emr-container-application-log-dir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwjbd (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  hadoop-properties:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      spark-000000037b4uegnu8q5-0fabc19d64f5b290-hadoop-config
    Optional:  false
  pod-template-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      spark-000000037b4uegnu8q5-0fabc19d64f5b290-driver-podspec-conf-map
    Optional:  false
  spark-local-dir-1:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  emr-container-communicate:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      fluentd-ucm2plcw0b7q1xtf0z9vdxxya-000000037b4uegnu8q5
    Optional:  false
  emr-container-s3:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  emr-containers-s3-ucm2plcw0b7q1xtf0z9vdxxya-000000037b4uegnu8q5
    Optional:    false
  emr-container-application-log-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  emr-container-event-log-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  temp-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  mnt-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  home-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  000000037b4uegnu8q5-spark-defaults:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      000000037b4uegnu8q5-spark-defaults
    Optional:  false
  spark-conf-volume-driver:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      spark-drv-06c4b19d64f5c7e4-conf-map
    Optional:  false
  kube-api-access-vwjbd:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age    From               Message
  ----     ------       ----   ----               -------
  Normal   Scheduled    3m36s  default-scheduler  Successfully assigned default/spark-000000037b4uegnu8q5-driver-k66f6 to ip-10-0-0-82.ec2.internal
  Warning  FailedMount  3m35s  kubelet            MountVolume.SetUp failed for volume "hadoop-properties" : configmap "spark-000000037b4uegnu8q5-0fabc19d64f5b290-hadoop-config" not found
  Warning  FailedMount  3m35s  kubelet            MountVolume.SetUp failed for volume "pod-template-volume" : configmap "spark-000000037b4uegnu8q5-0fabc19d64f5b290-driver-podspec-conf-map" not found
  Warning  FailedMount  3m35s  kubelet            MountVolume.SetUp failed for volume "spark-conf-volume-driver" : configmap "spark-drv-06c4b19d64f5c7e4-conf-map" not found
  Normal   Pulling      3m34s  kubelet            Pulling image "755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0:latest"
  Normal   Pulled       2m53s  kubelet            Successfully pulled image "755674844232.dkr.ecr.us-east-1.amazonaws.com/fluentd/emr-7.0.0:latest" in 41.471s (41.471s including waiting). Image size: 773303577 bytes.
  Normal   Created      2m53s  kubelet            Container created
  Normal   Started      2m53s  kubelet            Container started
  Normal   Pulling      2m53s  kubelet            Pulling image "755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0:latest"
  Normal   Pulled       57s    kubelet            Successfully pulled image "755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-7.0.0:latest" in 1m55.292s (1m55.292s including waiting). Image size: 3184975025 bytes.
  Normal   Created      57s    kubelet            Container created
  Normal   Started      57s    kubelet            Container started

2026-04-06T22:45:39.658106Z [info     ] Done. Returned value was: None [airflow.task.operators.airflow.providers.standard.decorators.python._PythonDecoratedOperator]
[] []

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@ferruzzi ferruzzi requested a review from o-nikolas as a code owner April 6, 2026 23:33
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have some shared code to do this on the other EKS tests. If you look at the usage of this thing it's used for every EKS test except the one you updated. Should we not just re-use this existing code?

@ferruzzi
Copy link
Copy Markdown
Contributor Author

ferruzzi commented Apr 7, 2026

@o-nikolas That one requires a pod name which the other EKS tests get from XComs but the EMR one doesn't have, and the EMR one uses a namespace instead.  I could modify that one to make pod_name optional, add an optional namespace, and a bit of logic to assert that one and one one of those is provided

This felt slightly cleaner rather than modifying good existing code for a (unique?) edge case, but I can modify the existing helper. I'll change it up in a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants