Skip to content

orca: implement intra-segment parallel table scan support#1398

Open
yjhjstz wants to merge 3 commits intoapache:mainfrom
yjhjstz:yjhjstz/orca_parallel
Open

orca: implement intra-segment parallel table scan support#1398
yjhjstz wants to merge 3 commits intoapache:mainfrom
yjhjstz:yjhjstz/orca_parallel

Conversation

@yjhjstz
Copy link
Copy Markdown
Member

@yjhjstz yjhjstz commented Oct 16, 2025

Add comprehensive parallel table scan capability to GPORCA optimizer, enabling worker-level parallelism within segments for improved query performance on large table scans.

Key components:

  • New CPhysicalParallelTableScan operator and CDistributionSpecWorkerRandom distribution specification for worker-level data distribution
  • CXformGet2ParallelTableScan transformation with parallel safety checks (excludes CTEs, dynamic scans, foreign tables, replicated tables, etc.)
  • Cost model integration with parallel_setup_cost and efficiency degradation scaling (logarithmic based on worker count)
  • DXL serialization/deserialization for CDXLPhysicalParallelTableScan
  • Plan translation to PostgreSQL SeqScan nodes with parallel_aware=true
  • Rewindability constraints (parallel scans are non-rewindable)
  • GUC integration: max_parallel_workers_per_gather controls worker count

Impl #1316

TPCH improved 15%.

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@yjhjstz yjhjstz marked this pull request as ready for review October 17, 2025 16:41
@my-ship-it my-ship-it force-pushed the yjhjstz/orca_parallel branch from cde0dec to 05d9edf Compare October 20, 2025 06:54
@my-ship-it my-ship-it self-requested a review October 20, 2025 07:50
@my-ship-it
Copy link
Copy Markdown
Contributor

Please add more test cases

Comment thread src/backend/gpopt/gpdbwrappers.cpp
Comment thread src/backend/gpopt/gpdbwrappers.cpp
Comment thread src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp Outdated
Comment thread src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp Outdated
Comment thread src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp
Comment thread src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp
Comment thread src/backend/gporca/libgpdbcost/src/CCostModelGPDB.cpp
Comment thread src/backend/gporca/libgpopt/include/gpopt/base/CRewindabilitySpec.h Outdated
Comment thread src/backend/gporca/libgpopt/src/search/CGroup.cpp
@yjhjstz
Copy link
Copy Markdown
Member Author

yjhjstz commented Oct 21, 2025

Please add more test cases

see src/test/regress:installcheck-orca-parallel

@yjhjstz yjhjstz force-pushed the yjhjstz/orca_parallel branch from 05d9edf to 8a5bc1e Compare October 21, 2025 15:24
Comment thread src/test/regress/GNUmakefile
Comment thread src/test/regress/excluded_tests.conf
Copy link
Copy Markdown
Contributor

@my-ship-it my-ship-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@avamingli
Copy link
Copy Markdown
Contributor

Add some cases to test the plan?

@yjhjstz
Copy link
Copy Markdown
Member Author

yjhjstz commented Oct 24, 2025

Add some cases to test the plan?

maybe after impl parallel hash join .

@yjhjstz yjhjstz closed this Jan 16, 2026
@yjhjstz yjhjstz reopened this May 8, 2026
@yjhjstz yjhjstz force-pushed the yjhjstz/orca_parallel branch from 8a5bc1e to 319b9f1 Compare May 8, 2026 16:04
yjhjstz added 3 commits May 9, 2026 01:44
Add comprehensive parallel table scan capability to GPORCA optimizer,
enabling worker-level parallelism within segments for improved query
performance on large table scans.

Key components:
- New CPhysicalParallelTableScan operator and CDistributionSpecWorkerRandom
distribution specification for worker-level data distribution
- CXformGet2ParallelTableScan transformation with parallel safety checks
(excludes CTEs, dynamic scans, foreign tables, replicated tables, etc.)
- Cost model integration with parallel_setup_cost and efficiency degradation
scaling (logarithmic based on worker count)
- DXL serialization/deserialization for CDXLPhysicalParallelTableScan
- Plan translation to PostgreSQL SeqScan nodes with parallel_aware=true
- Rewindability constraints (parallel scans are non-rewindable)
- GUC integration: max_parallel_workers_per_gather controls worker count
Fix two related issues in CDistributionSpecWorkerRandom:
- Remove incorrect EdtRandom branch from Matches(): WorkerRandom should not
  match segment-level Random distribution
- FSatisfies() now returns false for EdtRandom: neither can satisfy the other
  without a Motion (consistent with Random::FSatisfies(WorkerRandom))
- AppendEnforcers(): split EdtRandom and EdtWorkerRandom into separate cases;
  EdtRandom now creates CDistributionSpecRandom (not WorkerRandom) as Motion target

Ported from hashdata-lightning commit 2800e88.
Note: EdtWorkerRandom with CPhysicalMotionHashDistributeWorkers deferred
pending that operator's availability.
In Cloudberry's MPP architecture, segment stats are delivered
asynchronously to the coordinator. The seq_scan counter can be
registered before seq_tup_read arrives from segments, causing
wait_for_stats() to exit prematurely and the subsequent assertion
to fail intermittently in the pax-ic-good-opt-off CI job.

Add an explicit wait condition (updated6) for seq_tup_read reaching
the expected value, and update the comment to reflect Cloudberry's
segment-level async stats delivery rather than parallel workers.
@yjhjstz yjhjstz force-pushed the yjhjstz/orca_parallel branch from 319b9f1 to 6db920d Compare May 8, 2026 17:44
@yjhjstz yjhjstz requested review from leborchuk, reshke and x4m May 11, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants