perf: improve performance of timestamp truncate for some formats#2996
perf: improve performance of timestamp truncate for some formats#2996andygrove wants to merge 7 commits intoapache:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2996 +/- ##
============================================
+ Coverage 56.12% 59.58% +3.45%
- Complexity 976 1368 +392
============================================
Files 119 167 +48
Lines 11743 15497 +3754
Branches 2251 2573 +322
============================================
+ Hits 6591 9234 +2643
- Misses 4012 4966 +954
- Partials 1140 1297 +157 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Comet Microbenchmark Results: CometDatetimeExpressionBenchmarkCommit: Benchmark ResultsAutomated benchmark run by dfbench |
The timezone |
…cate # Conflicts: # spark/src/test/scala/org/apache/spark/sql/benchmark/CometDatetimeExpressionBenchmark.scala
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hmm..good point, thank you. I think I will close this PR for now and revisit another time. |
I filed #3477 |
Which issue does this PR close?
Part of #2995
Rationale for this change
Sub-day truncations (MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR) are timezone-independent - truncating to the minute boundary is the same regardless of timezone display. This allows simple integer arithmetic instead of expensive DateTime conversions.
Criterion Benchmarks
Microbenchmarks
What changes are included in this PR?
This PR does not improve performance for YEAR, MONTH, DAY, WEEK, or QUARTER, which all remain very slow.
How are these changes tested?