Distributed scheduler #2002
Merged
DiegoTavares merged 98 commits intoAcademySoftwareFoundation:new-schedulerfrom Dec 12, 2025
Merged
Distributed scheduler #2002DiegoTavares merged 98 commits intoAcademySoftwareFoundation:new-schedulerfrom
DiegoTavares merged 98 commits intoAcademySoftwareFoundation:new-schedulerfrom
Conversation
- Implement FrameRange and FrameSet structs to parse and represent complex frame range syntaxes including stepped, inverse stepped, negative steps, and interleaved ranges - Support chunking FrameSets into compact sub-ranges for dispatching - Integrate FrameSet chunking in RqdDispatcher for precise frame chunking - Improve dispatch error handling with distinct error types - Update host DAO and models to include allocation info for resource checks - Add .gitignore entry for /sandbox/kafka*
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
The producer module produces events on kafka for each pending job. The consumer modules consume events and books jobs on host, still relying on the database.
This version still contains an issue when executing multiple tests at the same time, as tests are sharing a database instance an they rely on it existing to work.
Optimized async + pgpool interaction, but still far from perfect.
Last commit before giving up on dashmap
There is a protection against processing multiple bookings on a single host at the same time on HostDao that uses a database lock. This protection is intended for multiple instances of the scheduler running at the same time. However, this logic was also being triggered by a single instance, which indicated there was a race condition in place. The race condition happens because hosts can belong to multiple groups at the same time.
Besides that, use host_stats for up-to-date memory information when updating the host cache.
To simplify testing, these changes are being migrated to a new PR
Entries were migrated to a new PR isolating the feature they were related to
The new option is define as: ```yaml ```
f0a4ffc to
06ad353
Compare
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Collaborator
Author
|
Sorry for the huge PR. Making incremental stacked PRs starting now until this is merged. |
6 tasks
Collaborator
Author
|
A new PR will be created to handle documenting this new module. Adding docs to this PR would make it too big to be reviewed. |
b02b608
into
AcademySoftwareFoundation:new-scheduler
22 checks passed
6 tasks
DiegoTavares
added a commit
that referenced
this pull request
Dec 16, 2025
This PR introduces a new module called "scheduler." This module is responsible for the booking aspect of Cuebot and is designed to offload this feature from the central module. Rationale: Cuebot's booking logic depends on responding to each HostReport with a new task that searches for layers to dispatch to the reporting host. Consequently, each request generates a [BookingQuery](https://github.com/AcademySoftwareFoundation/OpenCue/blob/master/cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatchQuery.java), which significantly impacts the database. As a result, scaling Cuebot is limited by the need to optimize database capacity to handle complex queries. This new module alleviates the booking workload from Cuebot. Booking on the Scheduler is not triggered by host reports; instead, it operates through an internal loop that searches for pending jobs and seeks suitable matches from a cached view of the hosts in the database. The scheduler organizes layers and hosts into clusters, with each cluster representing a group of show and allocation combinations. This structure allows multiple instances of the scheduler to share the load without competing for work, which is a significant issue in Cuebot. To enable Cuebot and the Scheduler to run concurrently without competing for work, a new feature was added to Cuebot, as detailed in #2087. This feature allows for the addition of an exclusion list containing show and allocations that should not be booked, or it can halt booking for all shows altogether. --------- Signed-off-by: Diego Tavares <dtavares@imageworks.com>
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a new module called "scheduler." This module is responsible for the booking aspect of Cuebot and is designed to offload this feature from the central module.
Rationale: Cuebot's booking logic depends on responding to each HostReport with a new task that searches for layers to dispatch to the reporting host. Consequently, each request generates a BookingQuery, which significantly impacts the database. As a result, scaling Cuebot is limited by the need to optimize database capacity to handle complex queries. This new module alleviates the booking workload from Cuebot.
Booking on the Scheduler is not triggered by host reports; instead, it operates through an internal loop that searches for pending jobs and seeks suitable matches from a cached view of the hosts in the database. The scheduler organizes layers and hosts into clusters, with each cluster representing a group of show and allocation combinations. This structure allows multiple instances of the scheduler to share the load without competing for work, which is a significant issue in Cuebot.
To enable Cuebot and the Scheduler to run concurrently without competing for work, a new feature was added to Cuebot, as detailed in #2087. This feature allows for the addition of an exclusion list containing show and allocations that should not be booked, or it can halt booking for all shows altogether.