Skip to content

[AURON #1638] Support scan ORC data using microsecond precision#1684

Merged
cxzl25 merged 5 commits intoapache:masterfrom
cxzl25:orc_exec_timestamp_precision
Dec 16, 2025
Merged

[AURON #1638] Support scan ORC data using microsecond precision#1684
cxzl25 merged 5 commits intoapache:masterfrom
cxzl25:orc_exec_timestamp_precision

Conversation

@cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented Dec 1, 2025

Which issue does this PR close?

Closes #1638

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

@cxzl25 cxzl25 marked this pull request as draft December 1, 2025 12:35
@cxzl25 cxzl25 marked this pull request as ready for review December 2, 2025 12:01
@cxzl25 cxzl25 requested a review from Copilot December 2, 2025 12:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for configurable microsecond precision when reading ORC timestamp columns, addressing issue #1638. The implementation introduces a new boolean configuration parameter that allows users to opt into microsecond precision instead of the default nanosecond precision when scanning ORC files.

  • Added configuration option spark.auron.orc.timestamp.use.microsecond with default value false
  • Integrated the configuration through the JNI bridge to the Rust native execution layer
  • Applied the timestamp precision setting via the orc-rust library's ArrowReaderBuilder

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
spark-extension/src/main/java/org/apache/spark/sql/auron/AuronConf.java Adds the ORC_TIMESTAMP_USE_MICROSECOND configuration entry with descriptive comment
native-engine/datafusion-ext-plans/src/orc_exec.rs Retrieves the configuration value and conditionally applies microsecond precision to the ORC reader builder
native-engine/auron-jni-bridge/src/conf.rs Defines the configuration bridge macro to expose the setting to Rust code

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@ShreyeshArangath ShreyeshArangath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, is it possible to add a test case for this?

@github-actions github-actions bot added the build label Dec 3, 2025
@cxzl25
Copy link
Contributor Author

cxzl25 commented Dec 3, 2025

it possible to add a test case for this

I tried to add it, but since Spark's UT will specify the time zone America/Los_Angeles in the JVM, the implementation in orc-rust cannot pass the time zone. In the PR of orc-rust, I added an ORC_READER_TIMEZONE environment variable to specify the time zone.

org.apache.spark.SparkFunSuite

  // Timezone is fixed to America/Los_Angeles for those timezone sensitive tests (timestamp_*)
  TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))

@cxzl25 cxzl25 force-pushed the orc_exec_timestamp_precision branch from c5b5256 to 047bbaa Compare December 12, 2025 11:17
@github-actions github-actions bot removed the build label Dec 12, 2025
@cxzl25 cxzl25 changed the title [AURON #1638] Scan orc data using microsecond precision [AURON #1638] Support scan ORC data using microsecond precision Dec 16, 2025
@cxzl25 cxzl25 merged commit aef1a62 into apache:master Dec 16, 2025
98 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ORC Overflow while decoding timestamp (seconds=-62135798400, nanoseconds=0) to Nanosecond

4 participants