Skip to content

GKE Ingestion Jobs Failing  #427

@lgvital

Description

@lgvital

Expected Behavior

Ingestion jobs successfully run and bigquery table (feast.customer_project_customer_transactions_v1) gets populated with data.

Current Behavior

>>> client.ingest("customer_transactions", customer_features)
Waiting for feature set to be ready for ingestion...
  0%|                                                                                                | 0/155 [04:48<?, ?rows/s]
Ingestion complete!

Ingestion statistics:
Success: 0/155
Removing temporary file(s)...

BQ table is empty:

SELECT COUNT(*) FROM `feast.customer_project_customer_transactions_v1`;

is 0.

Steps to reproduce

Follow latest GKE setup docs with the following differences:

  • my-project as project ID
  • Use image 0.4.2
  • Use NodePort to expose service to local client
  • In basic example, use $FEAST_CORE_URL and $FEAST_BATCH_SERVING_URL
  • Open up nodeports in GCP firewall 32090, 32091, 32092:
gcloud compute firewall-rules create feast-core-port --allow tcp:32090
gcloud compute firewall-rules create feast-online-port --allow tcp:32091
gcloud compute firewall-rules create feast-batch-port --allow tcp:32092

my-feast-values.yaml ends up looking like:

feast-core:
  enabled: true
  image:
    tag: "0.4.2"
  jvmOptions:
  - -Xms1024m
  - -Xmx1024m
  resources:
    requests:
      cpu: 1000m
      memory: 1024Mi
  service:
    type: NodePort
    grpc:
      nodePort: 32090
  gcpServiceAccount:
    useExistingSecret: true
feast-serving-online:
  enabled: true
  redis:
    enabled: true
  image:
    tag: "0.4.2"
  jvmOptions:
  - -Xms1024m
  - -Xmx1024m
  resources:
    requests:
      cpu: 500m
      memory: 1024Mi
  service:
    type: NodePort
    grpc:
      nodePort: 32091
  store.yaml:
    name: redis
    type: REDIS
    redis_config:
      port: 6379
    subscriptions:
    - name: "*"
      project: "*"
      version: "*"
feast-serving-batch:
  enabled: true
  redis:
    enabled: false
  image:
    tag: "0.4.2"
  jvmOptions:
  - -Xms1024m
  - -Xmx1024m
  resources:
    requests:
      cpu: 500m
      memory: 1024Mi
  service:
    type: NodePort
    grpc:
      nodePort: 32092
  gcpServiceAccount:
    useExistingSecret: true
  application.yaml:
    feast:
      jobs:
        staging-location: gs://my-project_feast_bucket/serving/batch
        store-type: REDIS
        store-options:
          host: localhost
          port: 6379
  store.yaml:
    name: bigquery
    type: BIGQUERY
    bigquery_config:
      project_id: my-project
      dataset_id: feast
    subscriptions:
    - name: "*"
      project: "*"
      version: "*"

Specifications

  • Version: 0.4.2, latest master python SDK (2a33f7b)
  • Platform: Installing GKE on local Mac OSX
  • Subsystem: Python 3.7.6, helm v2.16.1, kubectl v1.17.0

Possible Solution

Default Kafka configs might need adjusting? Maybe related to NodePort config?

kubectl logs for feast-feast-core appear to show kafka jobs were successfully created and sent to DirectRunner:

22:57:52 [pool-5-thread-1] INFO  feast.ingestion.ImportJob - Starting import job with settings:
Current Settings:
  appName: DirectRunnerJobManager
  blockOnRun: false
  enforceEncodability: true
  enforceImmutability: true
  featureSetJson: [{
  "name": "customer_transactions",
  "version": 1,
  "entities": [{
    "name": "customer_id",
    "valueType": "INT64"
  }],
  "features": [{
    "name": "total_transactions",
    "valueType": "INT64"
  }, {
    "name": "daily_transactions",
    "valueType": "DOUBLE"
  }],
  "maxAge": "86400s",
  "source": {
    "type": "KAFKA",
    "kafkaSourceConfig": {
      "bootstrapServers": "feast-kafka:9092",
      "topic": "feast"
    }
  },
  "project": "customer_project"
}]
  gcsPerformanceMetrics: false
  optionsId: 0
  project:
  runner: class org.apache.beam.runners.direct.DirectRunner
  stableUniqueNames: WARNING
  storeJson: [{
  "name": "bigquery",
  "type": "BIGQUERY",
  "subscriptions": [{
    "name": "*",
    "version": "*",
    "project": "*"
  }],
  "bigqueryConfig": {
    "projectId": "my-project",
    "datasetId": "feast"
  }
}]

22:57:54 [pool-5-thread-1] INFO  feast.ingestion.utils.StoreUtil - Writing to existing BigQuery table 'my-project:feast.customer_project_customer_transactions_v1'
22:57:54 [pool-6-thread-1] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Partitions assigned to split 0 (total 1): feast-0
2020-01-09 22:57:54.984 AUDIT feast-feast-core-dc485b44d-qg75w --- [pool-6-thread-1] f.c.l.AuditLogger                        : {action=STATUS_CHANGE, detail=Job submitted to runner DirectRunner with ext id kafka-to-redis1578610671583., id=kafka-to-redis1578610671583, resource=JOB, timestamp=Thu Jan 09 22:57:54 UTC 2020}
22:57:55 [pool-5-thread-1] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Partitions assigned to split 0 (total 1): feast-0
2020-01-09 22:57:55.373 AUDIT feast-feast-core-dc485b44d-qg75w --- [pool-5-thread-1] f.c.l.AuditLogger                        : {action=STATUS_CHANGE, detail=Job submitted to runner DirectRunner with ext id kafka-to-bigquery1578610671583., id=kafka-to-bigquery1578610671583, resource=JOB, timestamp=Thu Jan 09 22:57:55 UTC 2020}
22:57:55 [pool-2-thread-1] INFO  feast.core.service.JobCoordinatorService - Updating feature set status
22:57:58 [direct-runner-worker] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Reader-0: reading from feast-0 starting at offset 0
22:57:58 [direct-runner-worker] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Reader-0: reading from feast-0 starting at offset 0

And a feast topic was successfully created according to the kafka-config pod:

Waiting for Zookeeper...
Waiting for Kafka...
Applying runtime configuration using confluentinc/cp-kafka:5.0.1
Created topic "feast".
Configs for topic 'feast' are 

But, I'm getting errors thrown related to topics in feast-kafka-0 logs:

[2020-01-09 22:57:57,799] INFO [Log partition=__consumer_offsets-2, dir=/opt/kafka/data/logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2020-01-09 22:57:57,804] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition __consumer_offsets-8 at offset 0 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
[2020-01-09 22:57:57,804] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition __consumer_offsets-35 at offset 0 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
[2020-01-09 22:57:57,805] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition __consumer_offsets-41 at offset 0 (kafka.server.ReplicaFetcherThread)

Happy to share any other relevant logs. I'm admittedly not familiar with Kafka, so I could be off here. Just trying to get the GKE feast guide + basic example working end to end. Once it works, happy to put up a PR to update the guide for 0.4.X.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions