Skip to content

Conversation

@titusfortner
Copy link
Member

@titusfortner titusfortner commented Jan 22, 2026

User description

I've tried several times to improve how we determine which targets to run tests against.

The options are all either inaccurate (just run everything) or slow (so slow they were timing out).
This is the hybrid approach. Do the slow thing 2x a day when scheduled CI runs, and the fast thing every other time. Should be good enough.

The pre-release PR found 2437 matching targets, this PR finds the correct 570.

💥 What does this PR do?

  • Adds Rake task bazel:build_test_index to index test targets based on affected BUILD packages
./go bazel:build_test_index            # --> path: 'build/bazel-test-target-index.json'
./go bazel:build_test_index index.json # --> path: 'index.json'
  • Adds Rake task bazel:affected_targets to provide a list of test targets that apply
./go bazel:affected_targets            # --> HEAD^..HEAD (last commit)
./go bazel:affected_targets abc123     # --> abc123^..abc123 (that commit)
./go bazel:affected_targets abc..def   # --> explicit range
  • When ci.yml event is scheduled it runs the indexing
  • When ci.yml event is anything else it checks the index for matching targets
  • Indexes get zipped and stored in GitHub cache (very small, very quick to zip/unzip)
  • Deleted scripts/github-actions/check-bazel-targets.sh (replaced by Rake tasks)
  • Added cache-name input to bazel.yml workflow
  • Simplified per-binding conditionals in ci.yml

Edge Cases:

  1. Add src file not in index: #find_bazel_package locates the package and runs tests for that package
  2. Delete src file: #find_bazel_package locates the package and runs tests for that package
  3. Add test file not in index: all changed test files get corresponding targets added directly
  4. Delete test file: ignored
  5. Add new package: generates target list from package (slow but infrequent)
  6. .NET tests depend on Java server causing all .NET tests to run each time java server code is changed; since .NET does not have any tests running against a grid, this does a complete filter
  7. If one test file is updated only one test file should run

🔧 Implementation Notes

I've spent way too long on this. I tried the tools, I tried various shortcuts, no good way to get accuracy with any speed.
I went back and forth on shell vs python vs rake, and Rake gives me the most flexibility, and lets us easily do this from local or on CI

💡 Additional Considerations

JavaScript will need to be added later
Rust is ignored in the indexing because we aren't toggling rust tests from this
This doesn't actually use the targets in the ci.yml yet, so we're still just seeing if there are any and running everything.
Actually limiting things is next

🔄 Types of changes

  • New feature (non-breaking change which adds functionality and tests!)

PR Type

Enhancement, Tests


Description

  • Replace slow shell script with fast Rake-based Bazel target lookup using index caching

  • Add bazel:build_test_index task to generate test target index from BUILD packages

  • Add bazel:affected_targets task to find affected targets using index or fallback

  • Simplify CI conditionals by splitting target output into per-binding flags

  • Switch index generation from macOS to Linux for faster, cheaper builds

  • Support workflow dispatch to rebuild index and run all tests


Diagram Walkthrough

flowchart LR
  A["Changed Files"] --> B["affected_targets Task"]
  B --> C{"Index Exists?"}
  C -->|Yes| D["Query Index"]
  C -->|No| E["Directory Fallback"]
  D --> F["Test Targets"]
  E --> F
  G["build_test_index Task"] --> H["Generate Index from BUILD Packages"]
  H --> I["Cache Index"]
  I --> D
Loading

File Walkthrough

Relevant files
Enhancement
check-bazel-targets.sh
Remove slow shell-based target lookup script                         

scripts/github-actions/check-bazel-targets.sh

  • Deleted entire shell script that performed slow Bazel target lookup
  • Functionality replaced by Rake tasks in Rakefile
  • Script used rdeps queries which were timing out
+0/-108 
bazel.yml
Add cache restore support for index files                               

.github/workflows/bazel.yml

  • Added cache-name input parameter for cache restore functionality
  • Added cache restore step using actions/cache/restore@v4
  • Allows workflows to restore pre-built index files
+12/-0   
ci-build-index.yml
Switch index generation to Linux platform                               

.github/workflows/ci-build-index.yml

  • Changed index build OS from macOS to Linux
  • Improves build speed and reduces CI costs
  • Requires trunk fix for macOS-only archive rules
+1/-1     
ci.yml
Refactor CI workflow to use Rake tasks and per-binding flags

.github/workflows/ci.yml

  • Updated build-index job to run on schedule or workflow_dispatch events
  • Replaced shell script invocation with inline Rake task calls
  • Refactored read-targets job to output per-binding flags instead of
    combined targets
  • Simplified per-binding conditionals from complex multi-line
    expressions to simple flag checks
  • Added cache-name parameter to check job for index restoration
+41/-48 
Rakefile
Add Rake tasks for fast Bazel target lookup with index     

Rakefile

  • Commented out lint task implementation (formatting, mypy, ruff, steep
    checks)
  • Added bazel:affected_targets task to find test targets affected by
    file changes
  • Added bazel:build_test_index task to generate index from BUILD
    packages
  • Added helper functions: find_bazel_package,
    affected_targets_with_index, targets_from_lookup, query_package_dep,
    targets_from_tests, affected_targets_fallback
  • Supports index-based lookup with directory fallback for missing
    packages
  • Handles edge cases like test files, library files, and Java/dotnet
    dependencies
+144/-17

@selenium-ci selenium-ci added C-rb Ruby Bindings B-build Includes scripting, bazel and CI integrations labels Jan 22, 2026
@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 22, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Shell command injection

Description: The bazel:affected_targets task builds a shell command using backticks with
base_rev/head_rev derived from the user-supplied commit_range (git diff --name-only
#{base_rev} #{head_rev}), which can enable command injection if a malicious range is
provided (e.g., including shell metacharacters).
Rakefile [1658-1669]

Referred Code
task :affected_targets, %i[commit_range index_file] do |_task, args|
  range = args[:commit_range] || 'HEAD'
  index_file = args[:index_file] || 'build/bazel-test-target-index.json'
  base_rev, head_rev = if range.include?('..')
                         range.split('..', 2)
                       else
                         ["#{range}^", range]
                       end

  puts "Commit range: #{base_rev}..#{head_rev}"

  changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?)
Bazel query injection

Description: Bazel query strings are constructed by interpolating the repository file path (via rel)
into attr(srcs, '#{rel}', ...), which could permit Bazel query injection or query breakage
if a crafted filename contains quotes/special characters (e.g., foo' ) + kind(".*",
//...), especially when filenames are sourced from git diff.
Rakefile [1793-1806]

Referred Code
query = test_files.filter_map { |f|
  pkg = find_bazel_package(f)
  next if pkg.nil?

  # Bazel srcs often use paths relative to the package, not basenames.
  rel = f.sub(%r{\A#{Regexp.escape(pkg)}/}, '')
  "attr(srcs, '#{rel}', //#{pkg}:*)"
}.join(' + ')
return [] if query.empty?

targets = []
Bazel.execute('query', ['--output=label'], query) do |out|
  targets = out.lines.map(&:strip).select { |l| l.start_with?('//') }
end
Cache path traversal

Description: The workflow uses inputs.cache-name directly to form a filesystem path (${{
inputs.cache-name }}.gz) for cache restore, which could allow unintended file
overwrite/path traversal within the workspace if an untrusted caller supplies a value like
../somefile to a workflow_call-exposed input.
bazel.yml [108-114]

Referred Code
- name: Restore cache
  if: inputs.cache-name != ''
  uses: actions/cache/restore@v4
  with:
    path: ${{ inputs.cache-name }}.gz
    key: ${{ inputs.cache-name }}-
    restore-keys: ${{ inputs.cache-name }}-
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Unhandled failures: The new tasks do not robustly handle failures from git/file/JSON/Bazel operations (e.g.,
missing/invalid refs or malformed index) and will raise or silently proceed without
actionable error context.

Referred Code
task :affected_targets, %i[commit_range index_file] do |_task, args|
  range = args[:commit_range] || 'HEAD'
  index_file = args[:index_file] || 'build/bazel-test-target-index.json'
  base_rev, head_rev = if range.include?('..')
                         range.split('..', 2)
                       else
                         ["#{range}^", range]
                       end

  puts "Commit range: #{base_rev}..#{head_rev}"

  changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?)
  puts "Changed files: #{changed_files.size}"

  targets = if index_file && File.exist?(index_file)
              affected_targets_with_index(changed_files, index_file)
            else
              puts 'No index found, using directory-based fallback'
              affected_targets_fallback(changed_files)
            end



 ... (clipped 130 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Command injection risk: The commit_range argument is interpolated directly into a backtick shell command (git diff
--name-only #{base_rev} #{head_rev}) without validation/escaping, allowing potential
command injection if an attacker can influence the task arguments.

Referred Code
task :affected_targets, %i[commit_range index_file] do |_task, args|
  range = args[:commit_range] || 'HEAD'
  index_file = args[:index_file] || 'build/bazel-test-target-index.json'
  base_rev, head_rev = if range.include?('..')
                         range.split('..', 2)
                       else
                         ["#{range}^", range]
                       end

  puts "Commit range: #{base_rev}..#{head_rev}"

  changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?)
  puts "Changed files: #{changed_files.size}"

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Unstructured logging: The new tasks emit multiple free-form puts logs (including target labels and file counts)
rather than structured logs, and it is unclear from the diff whether this output could
include sensitive repo/internal path information in some environments.

Referred Code
  puts "Commit range: #{base_rev}..#{head_rev}"

  changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?)
  puts "Changed files: #{changed_files.size}"

  targets = if index_file && File.exist?(index_file)
              affected_targets_with_index(changed_files, index_file)
            else
              puts 'No index found, using directory-based fallback'
              affected_targets_fallback(changed_files)
            end

  if targets.empty?
    puts 'No test targets affected'
    File.write('bazel-targets.txt', '')
  else
    puts "Found #{targets.size} affected test targets"
    File.write('bazel-targets.txt', targets.sort.join(' '))
    targets.sort.each { |t| puts t }
  end
end


 ... (clipped 42 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 22, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Optimize the Bazel index creation

The build_test_index task is inefficient because it runs a bazel query for every
test target. This can be optimized by running a single bazel query with a graph
output format (--output=xml) and parsing the result in memory to build the
index.

Examples:

Rakefile [1692-1729]
  task :build_test_index, [:index_file] do |_task, args|
    output = args[:index_file] || 'build/bazel-test-target-index.json'

    index = {}
    tests = []

    exclude_tags = %w[manual spotbugs ie]
    all_bindings = BINDING_TARGETS.values.join(' + ')
    tag_exclusions = exclude_tags.map { |tag| "except attr(tags, #{tag}, #{all_bindings})" }.join(' ')
    kind = '_test' # do not match test_suite or pytest_runner

 ... (clipped 28 lines)

Solution Walkthrough:

Before:

# Rakefile
task :build_test_index do
  # 1. Find all test targets
  all_tests = bazel_query("kind(_test, //...)")

  # 2. Loop through each test and query its dependencies individually
  all_tests.each do |test|
    # This runs a new bazel query for every single test
    dependencies = bazel_query("deps(#{test})")
    dependencies.each do |dep|
      # build index: index[package_of(dep)] << test
    end
  end
end

After:

# Rakefile
task :build_test_index do
  # 1. Get all tests and their dependencies in one go
  query = "deps(kind(_test, //...))"
  all_deps_xml = bazel_query(query, output: 'xml')

  # 2. Parse the XML output in memory
  graph = parse_xml_dependency_graph(all_deps_xml)

  # 3. Build the index from the parsed graph
  # This avoids running any more bazel commands.
  index = build_index_from_graph(graph)

  File.write(output, JSON.pretty_generate(index))
end
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a major performance bottleneck (N+1 queries) in the build_test_index task and proposes a standard, effective solution using a single bazel query, which would significantly speed up the scheduled index creation job.

High
General
Fix array intersection check

Fix a potential runtime error by replacing the non-standard Array#intersect?
method with the correct Ruby idiom (array1 & array2).any? for checking array
intersection.

Rakefile [1814]

-return BINDING_TARGETS.values if top_level_dirs.intersect?(%w[common rust])
+return BINDING_TARGETS.values if (top_level_dirs & %w[common rust]).any?
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies that Array#intersect? is not a standard Ruby method and would cause a runtime error, proposing a valid fix that prevents the script from crashing.

High
Possible issue
Default outputs for bindings

Modify the check_binding function to always produce an output for each binding
by adding an else branch that sets the output to false when the condition is not
met.

.github/workflows/ci.yml [72-77]

 check_binding() {
   local pattern=$1 tag=$2
   if [[ "$targets" == *"$pattern"* ]] || [[ "$COMMIT_MESSAGES" == *"[$tag]"* ]] || [[ "$PR_TITLE" == *"[$tag]"* ]]; then
     echo "$tag=true" >> "$GITHUB_OUTPUT"
+  else
+    echo "$tag=false" >> "$GITHUB_OUTPUT"
   fi
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: This suggestion correctly identifies that downstream jobs check for non-empty strings, not boolean true. Explicitly setting an output makes the workflow's logic clearer and more robust against future changes.

Medium
Learned
best practice
Avoid mutating input arrays

Don’t mutate the input array with select!; create a filtered copy so callers can
safely reuse their array without side effects.

Rakefile [1789-1808]

 def targets_from_tests(test_files)
-  test_files.select! { |f| File.exist?(f) }
-  return [] if test_files.empty?
+  existing_files = test_files.select { |f| File.exist?(f) }
+  return [] if existing_files.empty?
 
-  query = test_files.filter_map { |f|
+  query = existing_files.filter_map { |f|
     pkg = find_bazel_package(f)
     next if pkg.nil?
 
     # Bazel srcs often use paths relative to the package, not basenames.
     rel = f.sub(%r{\A#{Regexp.escape(pkg)}/}, '')
     "attr(srcs, '#{rel}', //#{pkg}:*)"
   }.join(' + ')
   return [] if query.empty?
 
   targets = []
   Bazel.execute('query', ['--output=label'], query) do |out|
     targets = out.lines.map(&:strip).select { |l| l.start_with?('//') }
   end
   targets
 end
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why:
Relevant best practice - Avoid mutating caller-provided collections; prefer defensive/immutable handling of collection-like state passed into helpers.

Low
  • Update

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a hybrid approach to determining which Bazel test targets to run in CI by introducing an index-based caching mechanism. The system builds a comprehensive index of test targets and their dependencies during scheduled runs, then uses this cached index for faster lookups during regular PR and push events.

Changes:

  • Replaces shell-based target discovery with Ruby Rake tasks for better flexibility and performance
  • Implements index caching via GitHub Actions cache to enable fast target lookups
  • Simplifies CI workflow conditionals by consolidating binding-specific logic

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
scripts/github-actions/check-bazel-targets.sh Deleted shell script that performed slow Bazel queries, replaced by Rake tasks
Rakefile Added two new Rake tasks (bazel:build_test_index and bazel:affected_targets) with supporting helper functions for index creation and target lookup
rb/.rubocop.yml Excluded Rakefile from Lint/RedundantRequireStatement check for the new 'set' requirement
.github/workflows/ci.yml Added build-index job for scheduled runs, updated check job to use cached index, simplified per-binding conditionals
.github/workflows/ci-build-index.yml New workflow for building and caching the test target index
.github/workflows/bazel.yml Added cache-name input parameter to support restoring cached index files

@titusfortner
Copy link
Member Author

I committed 0850cd0 which does the indexing part of this. Once a scheduled CI job has run building the index without issues, I'll merge the rest of this.

@titusfortner titusfortner marked this pull request as draft January 23, 2026 02:04
@titusfortner
Copy link
Member Author

I'm putting in draft until #16985 merges

@titusfortner titusfortner force-pushed the affected_targets branch 5 times, most recently from 8e89e2a to f01506a Compare January 24, 2026 02:57
@titusfortner titusfortner marked this pull request as ready for review January 24, 2026 02:57
@qodo-code-review
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Command injection

Description: The new bazel:affected_targets task builds and executes a shell command via backticks
(e.g., git diff --name-only #{base_rev} #{head_rev}), which can enable command injection
if the user-supplied range argument contains shell metacharacters (e.g., HEAD; rm -rf ~)
when run locally or in any environment where arguments are not strictly controlled.
Rakefile [1367-1380]

Referred Code
values = args.to_a
index_file = values.find { |val| File.exist?(val) }
range = (values - [index_file]).first || 'HEAD'
index_file ||= 'build/bazel-test-target-index'
base_rev, head_rev = if range.include?('..')
                       range.split('..', 2)
                     else
                       ["#{range}^", range]
                     end

puts "Commit range: #{base_rev}..#{head_rev}"

changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?)
puts "Changed files: #{changed_files.size}"
Cache path poisoning

Description: The new cache restore uses a caller-controlled inputs.cache-name for both path and key,
which could allow unintended file placement/overwrite in the workspace (e.g., setting
cache-name to . or another sensitive path) if an untrusted caller can invoke the reusable
workflow, potentially leading to executing poisoned restored content in later steps.
bazel.yml [74-115]

Referred Code
      cache-name:
        description: Name for cache restore (restores {name}.gz with key {name}-)
        required: false
        type: string
        default: ''

jobs:
  bazel:
    name: ${{ inputs.name }}
    runs-on: ${{ contains(inputs.os, '-') && inputs.os || format('{0}-latest', inputs.os) }}
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      SEL_M2_USER: ${{ secrets.SEL_M2_USER }}
      SEL_M2_PASS: ${{ secrets.SEL_M2_PASS }}
      TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }}
      GEM_HOST_API_KEY: ${{ secrets.GEM_HOST_API_KEY }}
      NUGET_API_KEY: ${{ secrets.NUGET_API_KEY }}
      NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
    steps:
      - name: Checkout source tree
        uses: actions/checkout@v4


 ... (clipped 21 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Unhandled command failures: The new bazel:affected_targets task shells out to git diff and calls Bazel queries without
clearly handling non-zero exit cases, which may cause confusing failures depending on the
CI environment and revision availability.

Referred Code
  puts "Commit range: #{base_rev}..#{head_rev}"

  changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?)
  puts "Changed files: #{changed_files.size}"

  targets = if index_file && File.exist?(index_file)
              affected_targets_with_index(changed_files, index_file)
            else
              puts 'No index found, using directory-based fallback'
              affected_targets_fallback(changed_files)
            end

  if targets.empty?
    puts 'No test targets affected'
    File.write('bazel-targets.txt', '')
  else
    puts "Found #{targets.size} affected test targets"
    File.write('bazel-targets.txt', targets.sort.join(' '))
    targets.sort.each { |t| puts t }
  end
end


 ... (clipped 6 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unvalidated cache path: The new cache-name input is used directly as an actions/cache/restore path, so callers
could supply unexpected paths unless constrained/validated at the workflow boundary.

Referred Code
      cache-name:
        description: Name for cache restore (restores {name}.gz with key {name}-)
        required: false
        type: string
        default: ''

jobs:
  bazel:
    name: ${{ inputs.name }}
    runs-on: ${{ contains(inputs.os, '-') && inputs.os || format('{0}-latest', inputs.os) }}
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      SEL_M2_USER: ${{ secrets.SEL_M2_USER }}
      SEL_M2_PASS: ${{ secrets.SEL_M2_PASS }}
      TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }}
      GEM_HOST_API_KEY: ${{ secrets.GEM_HOST_API_KEY }}
      NUGET_API_KEY: ${{ secrets.NUGET_API_KEY }}
      NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
    steps:
      - name: Checkout source tree
        uses: actions/checkout@v4


 ... (clipped 20 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@titusfortner
Copy link
Member Author

It should index on linux now.

@qodo-code-review
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent an infinite loop in path traversal

To prevent a potential infinite loop in find_bazel_package, add a check to
terminate the directory traversal when the root directory is reached.

Rakefile [1451-1460]

 def find_bazel_package(filepath)
   path = File.dirname(filepath)
   until path.empty?
     return path if File.exist?(File.join(path, 'BUILD.bazel')) || File.exist?(File.join(path, 'BUILD'))
     return nil if path == '.'
 
+    prev_path = path
     path = File.dirname(path)
+    return nil if path == prev_path # Reached root
   end
   nil
 end
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical bug that could cause an infinite loop when processing files at the filesystem root, and the proposed fix is correct and robust.

High
Correct Python pattern check

In the read-targets job, change the Python binding check from //py: to //py/ to
correctly match affected targets and trigger the Python CI run.

.github/workflows/ci.yml [81-85]

 check_binding "//java/" "java"
-check_binding "//py:" "py"
+check_binding "//py/" "py"
 check_binding "//rb/" "rb"
 check_binding "//dotnet/" "dotnet"
 check_binding "//rust/" "rust"
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a pattern mismatch (//py: instead of //py/) that would prevent Python tests from being triggered correctly, which is a significant correctness issue in the CI logic.

Medium
High-level
Consider the risk of stale indexes

The dependency index can become stale because it's only updated on a schedule.
To prevent missed tests, detect changes in BUILD.bazel files and fall back to a
full dependency analysis or rebuild the index.

Examples:

Rakefile [1462-1483]
def affected_targets_with_index(changed_files, index_file)
  puts "Using index: #{index_file}"
  begin
    index = JSON.parse(File.read(index_file))
  rescue JSON::ParserError => e
    puts "Invalid JSON in index file: #{e.message}"
    puts 'Falling back to directory-based detection'
    return affected_targets_fallback(changed_files)
  end
  affected = Set.new

 ... (clipped 12 lines)

Solution Walkthrough:

Before:

# Rakefile
def affected_targets_with_index(changed_files, index_file)
  index = JSON.parse(File.read(index_file))
  # ...
  lib_files, _ = changed_files.partition(...)

  lib_files.each do |filepath|
    pkg = find_bazel_package(filepath)
    
    # `index[pkg]` is used even if a BUILD file in `pkg` was modified in the PR.
    # This can use stale dependency data from the old index.
    test_targets = index[pkg] || query_package_dep(pkg)
    
    affected.merge(test_targets)
  end
  # ...
end

After:

# Rakefile
def affected_targets_with_index(changed_files, index_file)
  build_files_changed = changed_files.any? { |f| f.end_with?('BUILD.bazel', 'BUILD') }

  if build_files_changed
    puts 'BUILD file changed, falling back to directory-based detection for accuracy.'
    return affected_targets_fallback(changed_files)
  end

  index = JSON.parse(File.read(index_file))
  # ...
  lib_files, _ = changed_files.partition(...)

  lib_files.each do |filepath|
    pkg = find_bazel_package(filepath)
    test_targets = index[pkg] || query_package_dep(pkg)
    affected.merge(test_targets)
  end
  # ...
end
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical flaw where changes to BUILD.bazel files can cause the cached index to be stale, leading to missed test runs and undermining the reliability of the new target selection logic.

Medium
Learned
best practice
Avoid mutating input collections

Don’t use in-place filters like select! on an argument; create a filtered local
array instead so callers aren’t unexpectedly modified.

Rakefile [1505-1524]

 def targets_from_tests(test_files)
-  test_files.select! { |f| File.exist?(f) }
-  return [] if test_files.empty?
+  existing = test_files.select { |f| File.exist?(f) }
+  return [] if existing.empty?
 
-  query = test_files.filter_map { |f|
+  query = existing.filter_map { |f|
     pkg = find_bazel_package(f)
     next if pkg.nil?
 
     # Bazel srcs often use paths relative to the package, not basenames.
     rel = f.sub(%r{\A#{Regexp.escape(pkg)}/}, '')
     "attr(srcs, '#{rel}', //#{pkg}:*)"
   }.join(' + ')
   return [] if query.empty?
 
   targets = []
   Bazel.execute('query', ['--output=label'], query) do |out|
     targets = out.lines.map(&:strip).select { |l| l.start_with?('//') }
   end
   targets
 end
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why:
Relevant best practice - When accepting collection-like inputs, avoid mutating caller-provided arrays/hashes; make a defensive copy to prevent hidden side effects.

Low
  • More

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

@titusfortner titusfortner merged commit 5baa84b into trunk Jan 24, 2026
19 checks passed
@titusfortner titusfortner deleted the affected_targets branch January 24, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations C-rb Ruby Bindings Review effort 4/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants