Skip to content

ci(pipelines): migrate TiFlash next-gen integration test to OCI artifacts#4478

Merged
ti-chi-bot[bot] merged 6 commits intomainfrom
fix/tiflash-nextgen-job
Apr 8, 2026
Merged

ci(pipelines): migrate TiFlash next-gen integration test to OCI artifacts#4478
ti-chi-bot[bot] merged 6 commits intomainfrom
fix/tiflash-nextgen-job

Conversation

@wuhuizuo
Copy link
Copy Markdown
Contributor

@wuhuizuo wuhuizuo commented Apr 8, 2026

Replace branch-based dependency resolution with OCI artifact tags. Add new helper function computeArtifactNextGenOciTagFromPR to generate nextgen-specific tags. Update image pull logic to use the new hub-zot registry.

artifacts

Replace branch-based dependency resolution with OCI artifact tags.
Add new helper function `computeArtifactNextGenOciTagFromPR` to generate
nextgen-specific tags. Update image pull logic to use the new hub-zot
registry.
Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary:
This PR migrates the TiFlash next-gen integration test pipeline from branch-based dependency resolution to fetching OCI container images by tags, hosted on a new hub-zot registry. It introduces a helper method to compute next-gen OCI tags and updates the image pull and test execution steps accordingly. The approach is straightforward and aligns well with the stated goal. Overall, the changes are clear and appropriately scoped for the migration.


Critical Issues

  • Incorrect image paths for OCI images
    File: pipelines/pingcap/tiflash/latest/pull_integration_next_gen.groovy lines ~37-44
    The images for PD, TiKV, and TiDB are pulled from paths like ${OCI_ARTIFACT_HOST}/tikv/pd/image:${OCI_TAG_PD} and ${OCI_ARTIFACT_HOST}/tikv/pd/image:${OCI_TAG_TIKV}, which mix tikv and pd in the path, likely a copy-paste error. This would cause pulling wrong or non-existent images and break the test.
    Suggestion: Fix the image repository paths to match the component. For example:
    withEnv([
      "PD_IMAGE=${OCI_ARTIFACT_HOST}/pingcap/pd/image:${OCI_TAG_PD}",
      "TIKV_IMAGE=${OCI_ARTIFACT_HOST}/pingcap/tikv/image:${OCI_TAG_TIKV}",
      "TIDB_IMAGE=${OCI_ARTIFACT_HOST}/pingcap/tidb/images/tidb-server:${OCI_TAG_TIDB}",
    ]) {
    Confirm actual repository naming conventions and ensure consistency.

Code Improvements

  • Refactor computeArtifactNextGenOciTagFromPR to avoid double nextgen suffix
    File: libraries/tipipeline/vars/component.groovy lines 1-5
    The new helper appends -nextgen if the returned tag does not already contain "nextgen". However, this is a brittle string check and can cause duplicate -nextgen-nextgen if underlying logic changes.
    Suggestion: Instead of string containment, clearly separate the next-gen tagging logic or add a parameter to computeArtifactOciTagFromPR to generate next-gen tags directly. Alternatively, normalize tag construction to avoid surprises.

  • Remove commented-out debug and auth code or add TODOs with context
    File: pipelines/pingcap/tiflash/latest/pull_integration_next_gen.groovy lines ~50-65
    There are commented-out debug sh steps and commented credentials logic. If these are no longer needed, remove them to keep code clean. Otherwise, add clear TODO comments with timelines and owners for enabling docker auth.

  • Hardcoded branch names in computeArtifactNextGenOciTagFromPR calls
    The default trunk branch "master" is passed explicitly multiple times; consider defining a constant or reading from config to avoid divergence if trunk branch changes.

  • Use consistent naming for environment variables
    The env variables use uppercase with underscores (good), but the image variables inside withEnv are uppercase with underscores but named like PD_IMAGE. Consider aligning these with other parts of the pipeline or documenting their usage.


Best Practices

  • Add documentation/comments for new helper function
    File: libraries/tipipeline/vars/component.groovy line 1
    Add a short Javadoc-style comment to computeArtifactNextGenOciTagFromPR explaining its purpose, parameters, and return value to improve maintainability.

  • Add tests covering new OCI tag computation helper
    There is no indication of unit or integration tests validating computeArtifactNextGenOciTagFromPR. Adding tests for tag generation logic would help prevent regressions.

  • Style: consistent spacing and code formatting
    Minor style note: there is an extra blank line at the start of the PR description and some inconsistent blank lines in the groovy files which can be cleaned.


Summary of action items:

- Fix OCI image repository paths for PD, TiKV, and TiDB in `pull_integration_next_gen.groovy` to avoid pulling incorrect images.
- Add documentation comments for `computeArtifactNextGenOciTagFromPR` in `component.groovy`.
- Refactor tag generation logic to avoid brittle string containment checks for next-gen suffix.
- Remove or clarify commented-out debug and credential code in the pipeline script.
- Add unit tests for helper functions related to OCI tag computation.

Addressing these will significantly improve correctness and maintainability of the migration.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new helper function to compute next-gen OCI tags and updates the TiFlash integration test pipeline to utilize these tags for PD, TiDB, and TiKV components. The changes also migrate the artifact host to a new mirror and update the test execution logic to pull Docker images explicitly. Review feedback identifies a critical copy-paste error in the TiKV image path, a potential regression regarding the default branch for TiKV, and unsafe access to pull request metadata that could cause failures in non-PR contexts.

@wuhuizuo
Copy link
Copy Markdown
Contributor Author

wuhuizuo commented Apr 8, 2026

Jenkins Replay Status

Summary: 1 success, 0 failure, 0 skipped.

Update the OCI artifact host from hub-zot.pingcap.net to
us-docker.pkg.dev.
Enable docker authentication for pulling TiDB, TiKV, and PD images by
uncommenting the withCredentials block and adding proper credential
setup.
Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary
This PR migrates the TiFlash next-gen integration test from branch-based dependency resolution to OCI artifact tags hosted on a new hub-zot registry. It introduces a helper function computeArtifactNextGenOciTagFromPR to generate next-gen specific OCI tags and updates the image pulling and test execution logic accordingly. The changes are focused and improve the pipeline's reliability by moving to a more robust artifact tagging scheme. The overall code quality is good, but there are opportunities to improve clarity, maintainability, and robustness in credential handling and environment variable usage.


Critical Issues

  • Incorrect OCI Image Paths (pipelines/pingcap/tiflash/latest/pull_integration_next_gen.groovy, lines ~33-49)
    The image paths used for PD_IMAGE, TIKV_IMAGE, and TIDB_IMAGE appear inconsistent:

    "PD_IMAGE=${OCI_ARTIFACT_HOST}/tikv/pd/image:${OCI_TAG_PD}",
    "TIKV_IMAGE=${OCI_ARTIFACT_HOST}/tikv/pd/image:${OCI_TAG_TIKV}",
    "TIDB_IMAGE=${OCI_ARTIFACT_HOST}/pingcap/tidb/images/tidb-server:${OCI_TAG_TIDB}",
    • PD_IMAGE and TIKV_IMAGE both point to ${OCI_ARTIFACT_HOST}/tikv/pd/image, which looks wrong since PD and TiKV are different components.
    • This likely leads to pulling the wrong images for PD and TiKV, breaking downstream tests.

    Suggested fix:
    Confirm the correct repository paths for each component and correct them. For example:

    "PD_IMAGE=${OCI_ARTIFACT_HOST}/pingcap/pd/image:${OCI_TAG_PD}",
    "TIKV_IMAGE=${OCI_ARTIFACT_HOST}/pingcap/tikv/image:${OCI_TAG_TIKV}",
    "TIDB_IMAGE=${OCI_ARTIFACT_HOST}/pingcap/tidb/images/tidb-server:${OCI_TAG_TIDB}",

Code Improvements

  • Use consistent and clear naming for artifact host environment variable (pull_integration_next_gen.groovy, line 33)
    The original host changes from 'us-docker.pkg.dev/pingcap-testing-account/hub' to 'us-docker.pkg.dev/pingcap-testing-account/tidbx'.
    The variable name OCI_ARTIFACT_HOST doesn't convey which repo or registry it points to. If multiple registries may be used, consider splitting into separate variables or clarifying the name, e.g., OCI_ARTIFACT_REGISTRY_URL.

  • Avoid hardcoded paths and cleanup of Docker config (pull_integration_next_gen.groovy, lines 45-59)
    The code copies the docker config to ~/.docker/config.json and then deletes ~/.docker directory:

    mkdir -p ~/.docker
    cp ${DOCKER_CONFIG_JSON} ~/.docker/config.json
    ...
    rm -rf ~/.docker

    This may cause issues if other processes rely on ~/.docker or if multiple parallel jobs run on the same agent.

    Suggested improvement:
    Use a temporary directory with mktemp -d for docker config to avoid side effects:

    export DOCKER_CONFIG=$(mktemp -d)
    cp ${DOCKER_CONFIG_JSON} $DOCKER_CONFIG/config.json
    docker pull ...
    rm -rf $DOCKER_CONFIG
  • Add error handling for docker pull commands
    If any docker pull fails, it will silently continue. Better to fail fast or retry. Use set -e or check return codes explicitly.

  • Refactor environment variable injection and shell usage (pull_integration_next_gen.groovy, lines 43-61)
    Instead of embedding multi-line shell scripts within Groovy strings, consider extracting shell scripts into separate files or pipeline library steps for clarity and maintainability.


Best Practices

  • Documentation for new helper function (component.groovy, lines 1-7)
    The new function computeArtifactNextGenOciTagFromPR lacks comments explaining its logic and the meaning of "nextgen" in the tag. Add a brief docstring:

    /**
     * Computes the OCI artifact tag for the next-gen components based on PR info.
     * Appends '-nextgen' suffix if not already present.
     */
  • Test coverage
    There is no indication of added tests for the new helper function or the updated logic in the pipeline. Consider adding unit tests for computeArtifactNextGenOciTagFromPR and integration tests for the new OCI artifact pulling logic.

  • Remove commented-out debug code (pull_integration_next_gen.groovy, lines 47-55)
    There are commented debug prints and shell commands that should be removed to keep code clean.

  • Consistent use of quotations
    Use consistent quoting style in Groovy, e.g., prefer double quotes "" for interpolated strings and single quotes '' for literals.


Overall, the PR achieves its goal but requires fixes for image path correctness and improvements in configuration management and documentation to ensure reliability and maintainability.

Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary
This PR migrates the TiFlash next-gen integration test pipeline from using branch-based dependency resolution to OCI artifact tags hosted on a new registry (hub-zot). It introduces a helper function computeArtifactNextGenOciTagFromPR to generate nextgen-specific OCI tags and updates the pipeline to pull images from the new OCI registry with proper authentication. The code overall is clear and logically consistent, improving dependency resolution and image pulling for integration tests.


Critical Issues

  • Potential Docker config cleanup race condition
    In pull_integration_next_gen.groovy around lines 37-56, the script copies the Docker config to ~/.docker/config.json and removes the whole ~/.docker directory after pulling images:

    mkdir -p ~/.docker
    cp ${DOCKER_CONFIG_JSON} ~/.docker/config.json
    ...
    rm -rf ~/.docker

    This could cause problems if multiple pipeline steps run concurrently on the same agent, or if ~/.docker contains other important config files. Also, removing the entire folder might be too aggressive.

    Suggestion: Use a unique temporary directory (e.g., in workspace) for Docker config or explicitly delete only the copied config file:

    mkdir -p ${WORKSPACE}/.docker
    cp ${DOCKER_CONFIG_JSON} ${WORKSPACE}/.docker/config.json
    export DOCKER_CONFIG=${WORKSPACE}/.docker
    docker pull ...
    rm -rf ${WORKSPACE}/.docker

    This isolates config and avoids side effects.


Code Improvements

  • Improve helper function naming clarity and consistency
    In component.groovy (lines 1-7), the new function:

    def computeArtifactNextGenOciTagFromPR(String component, String prTargetBranch, String prTitle, String trunkBranch="master") {
        def ret = computeArtifactOciTagFromPR(component, prTargetBranch, prTitle, trunkBranch)
        return ret.contains("nextgen") ? ret : "${ret}-nextgen"
    }

    The function appends -nextgen if not already present. However, this logic might produce unexpected tags if the base tag contains nextgen as a substring in other contexts.

    Suggestion: Make the check more precise, e.g., check suffix only:

    return ret.endsWith("-nextgen") ? ret : "${ret}-nextgen"

    This prevents false positives.

  • Avoid hardcoding component strings repeatedly
    In pull_integration_next_gen.groovy (lines 10-14), you call:

    component.computeArtifactNextGenOciTagFromPR('pd', ...)
    component.computeArtifactNextGenOciTagFromPR('tidb', ...)
    component.computeArtifactNextGenOciTagFromPR('tikv', ...)

    Consider extracting components to a list and looping to reduce duplication and improve maintainability.

  • Remove commented or debug code
    Several lines in pull_integration_next_gen.groovy (around lines 40-50) are commented out debug prints (printenv, docker ps -a, etc.). It’s best to remove or move those to a debug flag section to keep the pipeline clean.


Best Practices

  • Add comments explaining the new OCI tag logic and registry changes
    The new computeArtifactNextGenOciTagFromPR function and the switch to the tidbx OCI registry are critical changes but have no inline comments explaining why or how the tag naming works or why the registry changed. Adding brief comments would help future maintainers:

    // Generate OCI tag with '-nextgen' suffix for next-gen integration tests
  • Test coverage / validation
    There’s no indication of additional tests or dry-run validations to ensure the new OCI tags and image pulls work as expected, especially since this changes dependencies from branch-based to OCI tags. Consider adding or referencing tests that verify this migration.

  • Consistent naming for environment variables and image tags
    The environment variable names like OCI_TAG_PD and image references use a mix of lowercase and uppercase paths:

    OCI_ARTIFACT_HOST/...

    Ensure the paths and tags follow a consistent naming and casing scheme to avoid confusing image references.


Overall, this PR is a solid improvement to the pipeline, but addressing the Docker config isolation and clarifying the tag naming logic will make it more robust and maintainable. Adding comments and cleaning debugging code will improve readability and future troubleshooting.

- Remove unused volume mounts from both containers
- Consolidate temporary storage to single volume-tmp volume
- Update docker container image to new registry location
Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary
This PR migrates the TiFlash next-gen integration tests to use OCI artifact tags instead of branch-based dependency resolution. It introduces a helper function computeArtifactNextGenOciTagFromPR to generate next-gen-specific OCI tags and updates the image pull logic to use a new hub-zot OCI registry. The changes are focused and align with modern container image distribution practices, improving consistency and reliability in CI pipelines. Overall, the implementation is clear and integrates well with existing code, though some improvements around error handling and code clarity are possible.


  1. Critical Issues:
  • Lack of error handling when pulling docker images
    • File: pipelines/pingcap/tiflash/latest/pull_integration_next_gen.groovy (around line 429)
    • Issue: The docker pull commands inside the shell script do not handle failures explicitly, which could cause subsequent steps to run with missing or outdated images, leading to hard-to-debug test failures.
    • Suggestion: Add error checking or fail-fast behavior on docker pull commands, e.g.,
      docker pull $PD_IMAGE || exit 1
      docker pull $TIKV_IMAGE || exit 1
      docker pull $TIDB_IMAGE || exit 1
      Or run the script with set -e to fail immediately on any error.

  1. Code Improvements:
  • Refactor repeated image path strings into constants

    • File: pull_integration_next_gen.groovy (lines around 420-435)
    • Issue: The full image names are repeated inline in withEnv. This reduces maintainability and clarity.
    • Suggestion: Define constants or variables for image repository prefixes and suffixes to avoid duplication and make future updates easier. For example:
      final IMAGE_REPO_PD = "${OCI_ARTIFACT_HOST}/tikv/pd/image"
      final IMAGE_REPO_TIKV = "${OCI_ARTIFACT_HOST}/tikv/tikv/image"
      final IMAGE_REPO_TIDB = "${OCI_ARTIFACT_HOST}/pingcap/tidb/images/tidb-server"
      
      withEnv([
          "PD_IMAGE=${IMAGE_REPO_PD}:${OCI_TAG_PD}",
          "TIKV_IMAGE=${IMAGE_REPO_TIKV}:${OCI_TAG_TIKV}",
          "TIDB_IMAGE=${IMAGE_REPO_TIDB}:${OCI_TAG_TIDB}",
      ]) {
        ...
      }
  • Improve the helper function naming and logic

    • File: component.groovy (lines 1-7)
    • Issue: computeArtifactNextGenOciTagFromPR adds -nextgen suffix only if missing but relies on computeArtifactOciTagFromPR which strips slashes replacing with dashes. This implicit dependency could confuse readers.
    • Suggestion: Add a short comment explaining the relationship between the two functions and the rationale for the -nextgen suffix, or consider combining logic if appropriate.
  • Use consistent volume naming in pod YAML

    • File: pod-pull_integration_test.yaml (multiple lines)
    • Issue: Volumes were consolidated into a single volume named volume-tmp replacing multiple volumes (volume-0, volume-3, workspace-volume), but the naming and usage are generic and might cause confusion during debugging or future maintenance.
    • Suggestion: Use more descriptive volume names reflecting their purpose or keep original names if they serve distinct roles.
  • Remove commented out debug code if not needed

    • File: pull_integration_next_gen.groovy (lines 418-425)
    • Issue: The debug println and sh blocks are commented out but still present.
    • Suggestion: Remove or move debug code to debug branches to keep the main branch clean.

  1. Best Practices:
  • Add documentation comments for new functions

    • File: component.groovy (line 1)
    • Issue: The newly added computeArtifactNextGenOciTagFromPR lacks a comment explaining parameters, return value, or usage context.
    • Suggestion: Add GroovyDoc or at least a brief comment, e.g.:
      /**
       * Computes OCI artifact tag for next-gen components from PR metadata.
       * Adds '-nextgen' suffix if not already present.
       * @param component Component name
       * @param prTargetBranch Target branch of the PR
       * @param prTitle PR title
       * @param trunkBranch Default trunk branch, default is 'master'
       * @return OCI artifact tag string
       */
  • Add tests for computeArtifactNextGenOciTagFromPR

    • File: component.groovy (new function)
    • Issue: No tests for the new helper function are included or mentioned, risking regressions or incorrect tag generation.
    • Suggestion: Add unit tests covering cases with and without "nextgen" in base tag, different PR titles, and branches.
  • Consistent credential and config handling

    • File: pull_integration_next_gen.groovy (lines 430-440)
    • Issue: The docker config file is copied to ~/.docker/config.json and then removed at the end. This could be fragile if other processes run concurrently or if the home directory changes.
    • Suggestion: Use the DOCKER_CONFIG environment variable to point to a temporary directory to avoid side effects, e.g.:
      export DOCKER_CONFIG=$(mktemp -d)
      cp ${DOCKER_CONFIG_JSON} ${DOCKER_CONFIG}/config.json
      # docker commands
      rm -rf ${DOCKER_CONFIG}
  • Avoid hardcoding branch names

    • File: component.groovy and pull_integration_next_gen.groovy
    • Issue: The trunk branch default is hardcoded as "master" whereas many projects use "main", or this could change.
    • Suggestion: Consider passing trunk branch as a parameter through pipeline params or config files to increase flexibility.

Summary:
The PR achieves its goal of migrating to OCI artifact tags in a clean and logical manner. Addressing error handling for docker pulls, improving code clarity and maintainability, and adding documentation and tests for new code will significantly strengthen the robustness and readability of this change.

Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary:
This PR migrates the TiFlash next-gen integration test pipeline from branch-based dependency resolution to using OCI artifact tags, improving reproducibility and consistency. It adds a new helper function to compute next-gen specific OCI tags and updates the image pull logic to use the new hub-zot registry. Overall, the changes are well-scoped and focused on the CI pipeline, with good integration of the new tagging scheme and credential handling.


Critical Issues

  • Hardcoded Docker config path cleanup:
    • File: pipelines/pingcap/tiflash/latest/pull_integration_next_gen.groovy lines ~430-445
    • Issue: The script copies the docker config file to ~/.docker/config.json and deletes the ~/.docker folder afterward. If ~/.docker is not empty or used by other processes, this could cause problems.
    • Suggestion: Instead of deleting the entire ~/.docker directory, remove only the config.json file or isolate the docker config folder per build context to avoid accidental deletions. For example:
      mkdir -p ~/.docker
      cp ${DOCKER_CONFIG_JSON} ~/.docker/config.json
      # run docker commands
      rm ~/.docker/config.json

Code Improvements

  • Improved tag computation logic:

    • File: libraries/tipipeline/vars/component.groovy lines 1-5
    • Issue: The helper computeArtifactNextGenOciTagFromPR appends -nextgen if the tag does not contain "nextgen". This logic relies on the string check alone and may produce awkward tags if the original tag already includes "nextgen" in a different form or casing.
    • Suggestion: Make the check case-insensitive or more robust, e.g.:
      def computeArtifactNextGenOciTagFromPR(String component, String prTargetBranch, String prTitle, String trunkBranch="master") {
          def ret = computeArtifactOciTagFromPR(component, prTargetBranch, prTitle, trunkBranch)
          return (ret.toLowerCase().contains("nextgen")) ? ret : "${ret}-nextgen"
      }
    • This avoids duplication or inconsistent tags in edge cases.
  • Refactor repeated environment variables:

    • File: pipelines/pingcap/tiflash/latest/pull_integration_next_gen.groovy lines ~420-440
    • Issue: The three OCI tags are computed at the top, but the mapping to image names is hardcoded inside the withEnv block.
    • Suggestion: Use a map or constants for component-to-image-path mappings for easier maintenance:
      final imageMap = [
          pd   : "${OCI_ARTIFACT_HOST}/tikv/pd/image:${OCI_TAG_PD}",
          tikv : "${OCI_ARTIFACT_HOST}/tikv/tikv/image:${OCI_TAG_TIKV}",
          tidb : "${OCI_ARTIFACT_HOST}/pingcap/tidb/images/tidb-server:${OCI_TAG_TIDB}"
      ]
      ...
      withEnv([
          "PD_IMAGE=${imageMap.pd}",
          "TIKV_IMAGE=${imageMap.tikv}",
          "TIDB_IMAGE=${imageMap.tidb}",
      ]) {
          ...
      }
    • This improves readability and reduces risk of copy-paste errors.
  • Use consistent naming for volume names in pod spec:

    • File: pipelines/pingcap/tiflash/latest/pod-pull_integration_test.yaml lines ~16-43
    • Issue: The volume name volume-tmp replaces three different volumes merged into one, which is good. However, the volume name is generic and may conflict with other pods if reused.
    • Suggestion: Use a more descriptive name like tiflash-integration-tmp to avoid confusion and improve clarity.

Best Practices

  • Add comments on tag computation rationale:

    • File: libraries/tipipeline/vars/component.groovy lines 1-8
    • Issue: The purpose and detailed logic of computeArtifactNextGenOciTagFromPR is not documented.
    • Suggestion: Add comments explaining why -nextgen is appended and how the tag relates to branch/PR metadata to help future maintainers.
  • Add error handling for docker pull failures:

    • File: pipelines/pingcap/tiflash/latest/pull_integration_next_gen.groovy lines ~435-445
    • Issue: The docker pull commands may fail silently or cause the pipeline to fail without clear messaging.
    • Suggestion: Add || exit 1 or wrap in try-catch with meaningful error messages to improve debugging.
  • Test coverage for new tag helper function:

    • File: libraries/tipipeline/vars/component.groovy
    • Issue: No indication of tests for computeArtifactNextGenOciTagFromPR.
    • Suggestion: Add unit tests covering different PR titles and branches to verify correct tag generation, including edge cases with/without "nextgen".

Overall, the PR introduces an important improvement in artifact handling for the TiFlash next-gen pipeline with a clear approach. Addressing the above points will improve robustness, maintainability, and clarity.

Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary:
This PR migrates the TiFlash next-gen integration test pipeline from branch-based Docker image resolution to OCI artifact tags hosted on a new private registry. It introduces a helper function for generating next-gen OCI tags, updates the image pull logic with new registry URLs, and simplifies volume mounts in pod specs. The overall approach is clear, aligning with best practices for immutable artifact references, and the implementation is straightforward. The code changes mostly improve maintainability and reliability of the CI pipeline.


Critical Issues:

  • No critical functional bugs detected.
    The PR updates image references and tag computation correctly, and the pod spec volume cleanup seems consistent.

Code Improvements:

  • Potential inconsistency in OCI_TAG_TIKV default tag (pull_integration_next_gen.groovy, lines ~10-13):

    final OCI_TAG_TIKV = (REFS.base_ref ==~ /release-nextgen-.*/ ? REFS.base_ref : "cloud-engine-nextgen")

    The default for OCI_TAG_TIKV is "cloud-engine-nextgen", while for PD and TiDB it's "master-nextgen". Ensure this intentional distinction is documented or justified, as it may cause confusion or incorrect image resolution if tags don't align across components.

  • Hardcoded registry paths and image names (pull_integration_next_gen.groovy, line ~35):

    OCI_ARTIFACT_HOST = 'us-docker.pkg.dev/pingcap-testing-account/tidbx'

    and image references like:

    ${OCI_ARTIFACT_HOST}/tikv/pd/image:${OCI_TAG_PD}
    ${OCI_ARTIFACT_HOST}/tikv/tikv/image:${OCI_TAG_TIKV}
    ${OCI_ARTIFACT_HOST}/pingcap/tidb/images/tidb-server:${OCI_TAG_TIDB}
    

    Consider externalizing or parameterizing the registry hostname and image paths for better maintainability and reuse across environments.

  • Credential usage scope (pull_integration_next_gen.groovy, line ~427):
    The withCredentials block copies the Docker config file for image pulls but does not explicitly clean up after. While the script removes ~/.docker, consider adding explicit error handling or "finally" style cleanup to avoid leftover sensitive files if the script fails.

  • Volume cleanup and naming consistency (pod-pull_integration_test.yaml):
    The volume names were simplified from multiple volumes to a single volume-tmp. Confirm that this volume consolidation does not cause unintended side effects, especially if other jobs expect the old named volumes.


Best Practices:

  • Documentation for new helper function (component.groovy, lines 1-5):

    def computeArtifactNextGenOciTagFromPR(String component, String prTargetBranch, String prTitle, String trunkBranch="master") {
        ...
    }

    Add a brief GroovyDoc comment explaining the function’s purpose, parameters, and return value for maintainability.

  • Test coverage for OCI tag generation:
    Verify if there are automated tests covering the new computeArtifactNextGenOciTagFromPR function and image pull logic changes. If not, consider adding unit tests for tag computation and pipeline integration tests to validate the new registry usage.

  • Echo or log pulled image tags for debugging (pull_integration_next_gen.groovy):
    Currently, the script pulls images silently:

    docker pull $PD_IMAGE
    docker pull $TIKV_IMAGE
    docker pull $TIDB_IMAGE

    Adding echo statements before each pull or enabling debug logs can help troubleshoot image pull issues in CI.

  • Consistent naming of variables:
    The variable OCI_ARTIFACT_HOST points to a registry + repository prefix, which might be confusing since it's named "HOST". Consider renaming it to OCI_ARTIFACT_REGISTRY or OCI_ARTIFACT_REPO_PREFIX for clarity.


Summary of action items:

- Add GroovyDoc comments to `computeArtifactNextGenOciTagFromPR` function.
- Review and document the rationale for differing default OCI tags, especially for TIKV.
- Parameterize registry URLs and image paths for easier future updates.
- Add error handling or ensure cleanup for Docker config files after image pulls.
- Confirm volume mount changes in pod spec do not break existing workflows.
- Add or verify tests covering new tag computation and OCI image pulling logic.
- Add debug logging around docker pull commands for better observability.
- Rename `OCI_ARTIFACT_HOST` to a clearer variable name representing registry + repo prefix.

Addressing these will improve maintainability, clarity, and robustness of the pipeline migration.

Copy link
Copy Markdown
Contributor

@dillon-zheng dillon-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added the lgtm label Apr 8, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dillon-zheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 8, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-08 08:05:24.245610599 +0000 UTC m=+943529.450970656: ☑️ agreed by dillon-zheng.

@ti-chi-bot ti-chi-bot bot added the approved label Apr 8, 2026
@ti-chi-bot ti-chi-bot bot merged commit effb4d1 into main Apr 8, 2026
6 checks passed
@ti-chi-bot ti-chi-bot bot deleted the fix/tiflash-nextgen-job branch April 8, 2026 08:11
@JaySon-Huang
Copy link
Copy Markdown
Contributor

ref pingcap/tiflash#10791

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

3 participants