Skip to content

"IllegalStateException: Recursive update" when converting Hudi table to Delta #466

@lucasmo

Description

@lucasmo

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

I’m trying to use XTable to convert a hudi source to a delta target and I am receiving the following exception. The table is active and frequently updated. It is being actively queried as a hudi table.

Is there any other debug information I can provide to make this more useful?

My git head is 4a96627
OS is Linux/Ubuntu
Java 11
Modified log4j2.xml to set level=trace for org.apache.hudi, o.a.xtable

Run with stacktrace:

$ java -jar ./xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig config.yaml
WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features.
2024-06-05 23:22:05 INFO  org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3://hidden-s3-bucket/hidden-prefix/ for following table formats [DELTA]
2024-06-05 23:22:05 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:05 WARN  org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-06-05 23:22:05 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-06-05 23:22:06 WARN  org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:155 - Loading Active commit timeline for s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]}
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__deltacommit__COMPLETED__20240605231917000]}
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 7 ms to read  0 instants, 0 replaced file groups
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.hbase.util.UnsafeAvailChecker (file:/incubator-xtable/xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.hbase.util.UnsafeAvailChecker
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2024-06-05 23:22:08 INFO  org.apache.hudi.common.util.ClusteringUtils:147 - Found 0 files in pending clustering operations
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:243 - Creating View Manager with storage type :MEMORY
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:255 - Creating in-memory based Table View
2024-06-05 23:22:11 INFO  org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-06-05 23:22:11 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty
2024-06-05 23:22:13 INFO  org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=8eda3e8f-9dae-4d19-ac72-f625b8ccb0c5] Created snapshot InitialSnapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=-1, metadata=Metadata(167f7b26-f82d-4765-97b9-b6e47d9147ec,null,null,Format(parquet,Map()),null,List(),Map(),Some(1717629733296)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-06-05 23:22:13 INFO  org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync.
2024-06-05 23:22:13 INFO  org.apache.hudi.common.table.TableSchemaResolver:317 - Reading schema from s3://hidden-s3-bucket/hidden-prefix/op_date=2024-06-05/3b5d27af-ef39-4862-bbd9-d4a010f6056e-0_0-71-375_20240605231837826.parquet
2024-06-05 23:22:14 INFO  org.apache.hudi.metadata.HoodieTableMetadataUtil:927 - Loading latest merged file slices for metadata table partition files
2024-06-05 23:22:14 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 1 ms to read  0 instants, 0 replaced file groups
2024-06-05 23:22:14 INFO  org.apache.hudi.common.util.ClusteringUtils:147 - Found 0 files in pending clustering operations
2024-06-05 23:22:14 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:429 - Building file system view for partition (files)
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:435 - #files found in partition (files) =30, Time taken =40
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.HoodieTableFileSystemView:386 - Adding file-groups for partition :files, #FileGroups=1
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:165 - addFilesToView: NumFiles=30, NumFileGroups=1, FileGroupsCreationTime=15, StoreTimeTaken=1
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:449 - Time to load partition (files) =57
2024-06-05 23:22:14 INFO  org.apache.hudi.metadata.HoodieBackedTableMetadata:451 - Opened metadata base file from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/files/files-0000-0_0-67-1304_20240605210834482001.hfile at instant 20240605210834482001 in 9 ms
2024-06-05 23:22:14 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]}
2024-06-05 23:22:14 ERROR org.apache.xtable.utilities.RunSync:171 - Error running sync for s3://hidden-s3-bucket/hidden-prefix/
org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve list of partition from metadata
    at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:127) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.hudi.HudiDataFileExtractor.getFilesCurrentState(HudiDataFileExtractor.java:116) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.hudi.HudiConversionSource.getCurrentSnapshot(HudiConversionSource.java:97) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
Caused by: java.lang.IllegalStateException: Recursive update
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1739) ~[?:?]
    at org.apache.avro.util.MapUtil.computeIfAbsent(MapUtil.java:42) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificData.getClass(SpecificData.java:257) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:508) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:355) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:186) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:263) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:248) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:1355) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$37(HoodieTableMetadataUtil.java:1284) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) ~[?:?]
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) ~[?:?]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
    at org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1283) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:473) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:429) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getOrCreateReaders$10(HoodieBackedTableMetadata.java:412) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) ~[?:?]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:412) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:291) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:255) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:145) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:316) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:125) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    ... 6 more

config.yaml:

sourceFormat: HUDI
targetFormats:
  - DELTA
datasets:
  -
    tableBasePath: s3://hidden-s3-bucket/hidden-prefix
    tableName: hidden_table
    partitionSpec: op_date:VALUE

hoodie.properties from the table:

hoodie.table.timeline.timezone=LOCAL
hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.table.precombine.field=ts_millis
hoodie.table.version=6
hoodie.database.name=
hoodie.datasource.write.hive_style_partitioning=true
hoodie.table.metadata.partitions.inflight=
hoodie.table.checksum=2622850774
hoodie.partition.metafile.use.base.format=false
hoodie.table.cdc.enabled=false
hoodie.archivelog.folder=archived
hoodie.table.name=hidden_table
hoodie.populate.meta.fields=true
hoodie.table.type=COPY_ON_WRITE
hoodie.datasource.write.partitionpath.urlencode=false
hoodie.table.base.file.format=PARQUET
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.metadata.partitions=files
hoodie.timeline.layout.version=1
hoodie.table.recordkey.fields=record_id
hoodie.table.partition.fields=op_date

I submitted this to the dev@ mailing list and received no response, so filing as an issue.

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions