Skip to content

ci: fix ignored timeout-minutes and add job-level timeouts#3291

Closed
SuyashJain17 wants to merge 6 commits intoOWASP:mainfrom
SuyashJain17:ci-fix-job-timeouts
Closed

ci: fix ignored timeout-minutes and add job-level timeouts#3291
SuyashJain17 wants to merge 6 commits intoOWASP:mainfrom
SuyashJain17:ci-fix-job-timeouts

Conversation

@SuyashJain17
Copy link
Copy Markdown
Contributor

Proposed change

Resolves #3279

This PR fixes incorrectly defined and missing timeout-minutes settings in the run-ci-cd workflow.

Summary of changes

  • Moves all existing timeout-minutes definitions to the job level, where GitHub Actions enforces them correctly.
  • Adds explicit job-level timeouts to long-running and high-risk jobs such as:
    • test jobs
    • Docker image builds
    • security scans
    • deployment jobs
  • Uses conservative timeout values based on the expected runtime of each job.

What this does NOT change

  • No job logic, commands, or execution order were modified.
  • No Docker build, scan, or deployment behavior was altered.
  • No new tools or workflow steps were introduced.

This change strictly improves CI reliability by ensuring stuck or runaway jobs are reliably terminated instead of running until GitHub Actions’ default 6-hour limit.

Checklist

  • Required: I read and followed the contributing guidelines
  • Required: I ran make check-test locally and all tests passed
  • I used AI for code, documentation, or tests in this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 10, 2026

Summary by CodeRabbit

  • Chores
    • CI/CD workflow updated to refine image scanning: added ignore rules and set some scans to use the latest scanner version.
    • Scan steps now tolerate certain failures (continue-on-error) to reduce pipeline interruptions.
    • Frontend accessibility test job removed from the pipeline, and build job dependencies simplified.
    • Minor trigger scope adjustment to change when the workflow runs.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Updates to the CI workflow (.github/workflows/run-ci-cd.yaml): adjusts Trivy scan invocations (adds trivyignores, version flags), sets several scans to continue on error, removes the frontend accessibility job, tweaks build job dependencies, and corrects a path-ignore filename for workflow triggers.

Changes

Cohort / File(s) Summary
CI workflow (single file)
\.github/workflows/run-ci-cd.yaml
Replaced a path-ignore filename (backend/data/nest.json.gzbackend/data/nest.json.gzz), added trivyignores: trivyignore.yaml to multiple Trivy steps and image-scan invocations, set continue-on-error: true for several scan steps and jobs, added version: latest to some trivy-action calls, removed the run-frontend-a11y-tests job and its references from needs arrays, and adjusted build job dependencies accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • kasya
  • arkid15r
🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The raw summary shows changes to trivy configuration, error handling, and removal of frontend a11y tests, which appear unrelated to the stated objective of fixing timeout-minutes. The PR contains multiple out-of-scope changes (trivy ignores, continue-on-error additions, frontend a11y tests removal) not mentioned in the PR objectives. Either document these changes or remove them to stay focused on timeout-minutes fixes.
Linked Issues check ❓ Inconclusive The PR description states 'Resolves #3279' and the changes aim to fix incorrectly defined timeout-minutes and add job-level timeouts, directly addressing the objectives in the linked issue. The raw summary describes changes unrelated to timeout-minutes (trivy ignores, error tolerance, frontend accessibility tests removal). Verify that the PR diff actually implements timeout-minutes fixes as described.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ci: fix ignored timeout-minutes and add job-level timeouts' clearly and specifically summarizes the main change in the PR.
Description check ✅ Passed The PR description is directly related to the changeset, explaining the timeout-minutes fixes and job-level timeout additions referenced in the linked issue #3279.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/run-ci-cd.yaml (1)

28-96: Consider adding timeouts to early validation jobs.

The pre-commit and check-frontend jobs are also long-running validation jobs that should have timeouts for consistency with the PR objectives. These jobs run formatting, linting, and pre-commit hooks that could hang indefinitely.

Suggested timeout additions

For the pre-commit job (after line 32):

 runs-on: ubuntu-latest
+timeout-minutes: 10
 steps:

For the check-frontend job (after line 67):

 runs-on: ubuntu-latest
+timeout-minutes: 10
 steps:

The set-release-version job is quick (just setting variables) and can reasonably skip a timeout.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d912c99 and 991065d.

📒 Files selected for processing (1)
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🔇 Additional comments (6)
.github/workflows/run-ci-cd.yaml (6)

103-103: LGTM! Appropriate timeouts for scan and check jobs.

The 10-minute timeout is reasonable for spell checking and Trivy security scans. Placement at the job level ensures GitHub Actions will enforce these limits correctly.

Also applies to: 121-121, 143-143


288-288: LGTM! Appropriate timeouts for Docker image builds.

The 15-minute timeout for building both backend and frontend images is reasonable, especially with Docker layer caching and multi-platform support involved.

Also applies to: 657-657


398-398: LGTM! Appropriate timeouts for image scans and deployments.

The 10-minute timeouts for Trivy image scans and Ansible deployment jobs are reasonable and consistent with similar jobs earlier in the workflow.

Also applies to: 442-442, 559-559, 763-763, 811-811, 939-939


587-587: LGTM! Appropriate timeouts for security testing.

The 20-minute timeout for Lighthouse CI and ZAP baseline scans is appropriate, as these security and performance tests can take significant time to complete comprehensive scans.

Also applies to: 619-619, 967-967


103-967: Flag inconsistency in AI-generated summary.

The AI summary states "Removes a previously present timeout-minutes entry in one backend test run," but no removed lines are visible in the annotated code. All changes marked with ~ are additions. This suggests the summary may be inaccurate or there's a discrepancy between the summary and the actual changes.


164-164: Verify timeout adequacy for test jobs by monitoring actual workflow execution times.

The 10-minute timeout for test jobs (backend, frontend unit, frontend e2e) covers Docker image building and test execution. While caching (GitHub Actions and registry) should speed up builds on subsequent runs, e2e tests can be unpredictable. Recommend monitoring recent workflow runs to confirm these jobs consistently complete within 10 minutes, and adjust if timeouts are frequently exceeded.

Also applies to: 198-198, 232-232

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 10, 2026
Copy link
Copy Markdown
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this attribute in alphabetical order (they are sorted).

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 11, 2026
@SuyashJain17 SuyashJain17 marked this pull request as ready for review January 11, 2026 19:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
.github/workflows/run-ci-cd.yaml (3)

28-33: Consider adding a job-level timeout to the pre-commit job.

This job runs multiple steps (Poetry install, pre-commit hooks, etc.) and could potentially hang. Per the PR objective to add timeouts to all long-running jobs, consider adding timeout-minutes: 10 after runs-on: ubuntu-latest.

Suggested fix
   pre-commit:
     name: Run pre-commit checks
     permissions:
       contents: read
     runs-on: ubuntu-latest
+    timeout-minutes: 10
     steps:

63-68: Consider adding a job-level timeout to the check-frontend job.

This job runs pnpm install, format, and lint operations that could hang. For consistency with the other jobs and the PR objective, consider adding timeout-minutes: 10.

Suggested fix
   check-frontend:
     name: Run frontend checks
     permissions:
       contents: read
     runs-on: ubuntu-latest
+    timeout-minutes: 10
     steps:

279-294: Consider reducing the timeout for this lightweight job.

The set-release-version job only runs a simple shell command to set a version string. A 15-minute timeout seems excessive for a job that should complete in seconds. Consider reducing to 5 minutes.

Suggested fix
   set-release-version:
     name: Set release version
     outputs:
       release_version: ${{ steps.set.outputs.release_version }}
     permissions: {}
     runs-on: ubuntu-latest
-    timeout-minutes: 15
+    timeout-minutes: 5
     steps:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 132a0cd and e73f23d.

📒 Files selected for processing (1)
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🔇 Additional comments (3)
.github/workflows/run-ci-cd.yaml (3)

98-255: LGTM!

The job-level timeout-minutes: 10 is correctly placed before steps for spellcheck, scan-code, scan-ci-dependencies, run-backend-tests, run-frontend-unit-tests, and run-frontend-e2e-tests. These are appropriate values for test and scan jobs.


296-667: LGTM!

Staging image build, scan, deployment, and quality assurance jobs have appropriate job-level timeouts:

  • 15 minutes for Docker builds (multi-stage builds with caching)
  • 10 minutes for image scans and deployments
  • 20 minutes for Lighthouse CI and ZAP baseline scans (external service interactions)

All placements are correct.


669-1013: LGTM!

Production image build, scan, deployment, and ZAP scan jobs correctly mirror the staging timeout values. Consistent timeout policies across environments is good practice. All placements are at the job level as required by GitHub Actions.

@SuyashJain17 SuyashJain17 requested a review from arkid15r January 12, 2026 09:32
Copy link
Copy Markdown
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move them after s? Check the existing cases.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/run-ci-cd.yaml (1)

28-96: Missing job-level timeouts for pre-commit and check-frontend jobs.

The PR objective is to add job-level timeouts for all long-running jobs, but these two jobs don't have timeout-minutes defined. Given that similar jobs like spellcheck received a 10-minute timeout, these should likely have one as well.

Proposed fix

Add timeout-minutes: 10 after the steps block for each job:

For pre-commit (after line 61):

          Run `make check` and use `git add` to address it.' && exit 1)
+    timeout-minutes: 10

For check-frontend (after line 96):

          Run `make check` and use `git add` to address it.' && exit 1)
+    timeout-minutes: 10
🧹 Nitpick comments (1)
.github/workflows/run-ci-cd.yaml (1)

286-294: Consider reducing the timeout for this trivial job.

This job only runs a simple shell command to set a version string and should complete in seconds. A 15-minute timeout is excessive—5 minutes would provide ample safety margin while being more aligned with the job's actual runtime.

Suggested change
-    timeout-minutes: 15
+    timeout-minutes: 5
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e73f23d and f15fcb1.

📒 Files selected for processing (1)
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-26T06:09:08.868Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:09:08.868Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🔇 Additional comments (7)
.github/workflows/run-ci-cd.yaml (7)

120-123: LGTM!

Timeout correctly placed at the job level with a reasonable 10-minute limit for Docker build and spell checking.


143-150: LGTM!

Timeouts correctly added at the job level for both security scan jobs with appropriate 10-minute limits.

Also applies to: 170-177


208-211: LGTM!

Timeouts correctly placed at the job level for all test jobs. 10-minute limits are appropriate for Docker build + test execution.

Also applies to: 242-245, 274-277


405-413: LGTM!

15-minute timeouts for the image build jobs are appropriately conservative given they build and push multiple Docker images.

Also applies to: 774-782


448-452: LGTM!

Timeouts correctly added at the job level for image scanning jobs with appropriate 10-minute limits.

Also applies to: 816-821


568-571: LGTM!

Deployment job timeouts are correctly placed at the job level. 10 minutes is reasonable for Ansible-based deployments.

Also applies to: 604-607, 948-951, 984-987


633-639: LGTM!

20-minute timeouts for Lighthouse CI and ZAP baseline scans are appropriately conservative given their external dependencies and potential variability.

Also applies to: 660-667, 1008-1015

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/run-ci-cd.yaml (1)

466-484: Inconsistent trivyignores application between backend and frontend scans.

The trivyignores: trivyignore.yaml parameter is added only to the frontend image scan, not the backend image scan. If certain vulnerabilities should be ignored, this should be applied consistently to both scans to avoid one passing and the other failing on the same ignored CVEs.

Suggested fix for staging image scan
       - name: Scan backend image
         uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
         with:
           exit-code: 1
           image-ref: owasp/nest:backend-staging
           scan-type: image
           skip-setup-trivy: true
           trivy-config: trivy.yaml
+          trivyignores: trivyignore.yaml
Suggested fix for production image scan
       - name: Scan backend image
         uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
         with:
           exit-code: 1
           image-ref: owasp/nest:backend-production
           scan-type: image
           skip-setup-trivy: true
           trivy-config: trivy.yaml
+          trivyignores: trivyignore.yaml

Also applies to: 833-851

🧹 Nitpick comments (1)
.github/workflows/run-ci-cd.yaml (1)

313-328: Consider reducing timeout for this trivial job.

This job only runs a simple shell command to set a version string—it should complete in under 1 second. A 15-minute timeout is unnecessarily generous and defeats the purpose of early termination for stuck jobs.

Suggested timeout reduction
     timeout-minutes: 15
+    timeout-minutes: 5

5 minutes provides ample buffer while still terminating promptly if something goes wrong.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f15fcb1 and f250a65.

📒 Files selected for processing (1)
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🔇 Additional comments (6)
.github/workflows/run-ci-cd.yaml (6)

123-123: LGTM!

Job-level timeout correctly placed for the spellcheck job. 10 minutes is appropriate for Docker image build and spell checking.


211-211: LGTM!

Job-level timeouts correctly added to all test jobs. The 10-minute limit is conservative and aligns with the PR objectives.

Also applies to: 245-245, 277-277, 311-311


448-448: LGTM!

15-minute timeout is appropriate for Docker image builds that include backend and frontend images with cache operations.

Also applies to: 815-815


603-603: LGTM!

Deployment job timeouts are appropriately set at 10 minutes for Ansible playbook executions.

Also applies to: 639-639, 981-981, 1017-1017


671-671: LGTM!

20-minute timeouts are appropriate for Lighthouse CI and ZAP baseline scans, which involve external tool execution and can vary in duration.

Also applies to: 699-699, 1045-1045


149-150: No action needed. The trivyignore.yaml file exists at the repository root and is properly configured to suppress CVE-2025-64756 (a glob command injection vulnerability). The references to trivyignores: trivyignore.yaml in the workflow are correct and intentional.

Likely an incorrect or invalid review comment.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 13, 2026
@SuyashJain17
Copy link
Copy Markdown
Contributor Author

Thanks for the clarification.

I’ve resolved the conflicts, ensured timeout-minutes is defined at the job level, and kept the Trivy with: properties lexicographically ordered (placing scan-type before skip-setup-trivy) to match existing cases.

CI is running, happy to adjust if you spot anything else.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/run-ci-cd.yaml (1)

471-478: Missing skip-setup-trivy: true for frontend scan.

The backend scan at line 468 includes skip-setup-trivy: true since Trivy is already set up at lines 457-460, but the frontend scan is missing this property. This inconsistency could cause redundant Trivy setup.

Suggested fix
       - name: Scan frontend image
         uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
         with:
           exit-code: 1
           image-ref: owasp/nest:frontend-staging
           scan-type: image
+          skip-setup-trivy: true
           trivy-config: trivy.yaml
     timeout-minutes: 10
🧹 Nitpick comments (2)
.github/workflows/run-ci-cd.yaml (2)

1-1: Remove UTF-8 BOM character at file start.

There's an invisible UTF-8 BOM (U+FEFF) before name:. While most YAML parsers handle this, it can cause issues with some tools and is generally not recommended for YAML files.


316-324: Consider reducing timeout for this lightweight job.

This job only runs a shell command to set the release version—it should complete in seconds. A 15-minute timeout is excessive; 5 minutes would be more than sufficient and consistent with the PR's goal of using conservative values.

Suggested change
-    timeout-minutes: 15
+    timeout-minutes: 5
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f250a65 and cb0b773.

📒 Files selected for processing (1)
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🔇 Additional comments (14)
.github/workflows/run-ci-cd.yaml (14)

120-123: LGTM!

The 10-minute timeout is appropriately placed at job level and reasonable for a Docker build with spellcheck.


143-148: LGTM!

Job-level timeout appropriately set for the Trivy repository scan.


168-173: LGTM!

Timeout is consistent with the other scan jobs.


204-207: LGTM!

The 10-minute timeout should be sufficient given the Docker layer caching. Monitor if builds consistently approach this limit.


238-241: LGTM!

Consistent 10-minute timeout across frontend test jobs is appropriate.


436-444: Timeout is reasonable but may be tight for cache misses.

Building and pushing two Docker images in 15 minutes should work with warm caches. If you experience timeouts during cache invalidation (e.g., dependency updates), consider increasing to 20 minutes.


594-597: LGTM!

10-minute timeout is reasonable for Ansible deployment jobs.


659-665: LGTM!

20-minute timeout appropriately accounts for the variable duration of Lighthouse CI audits.


678-693: LGTM!

20-minute timeout is appropriate for ZAP baseline scanning which involves crawling and security checks.


801-809: LGTM!

Consistent with the staging images build timeout.


827-842: LGTM!

Both image scans correctly include skip-setup-trivy: true. This is the pattern that should also be applied to the staging image scans (see earlier comment).


969-972: LGTM!

Consistent with staging deployment timeout.


1005-1008: LGTM!

Consistent with other deployment job timeouts.


1021-1036: LGTM!

Consistent 20-minute timeout with the staging ZAP baseline scan.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 13, 2026
arkid15r
arkid15r previously approved these changes Jan 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Line 1: Remove the leading UTF-8 BOM character at the very start of the YAML
file (the unexpected invisible character before the "name: Run CI/CD" content);
re-save the file as UTF-8 without BOM so the first character is the YAML text
character, confirm no invisible characters remain (e.g., via an editor option or
git diff), and commit the cleaned file.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb0b773 and 7d8969c.

📒 Files selected for processing (1)
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🔇 Additional comments (1)
.github/workflows/run-ci-cd.yaml (1)

123-123: Remove this comment—the timeout-minutes placement is correct.

The timeout-minutes property is properly indented at the job level (4-space indent, same as runs-on, permissions, and steps). According to GitHub Actions documentation, timeout-minutes is a job-level field that can appear "before or after steps"—the indentation level is what matters, not the order. GitHub Actions will not ignore it.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 13, 2026
arkid15r
arkid15r previously approved these changes Jan 13, 2026
@arkid15r arkid15r added this pull request to the merge queue Jan 13, 2026
@arkid15r arkid15r removed this pull request from the merge queue due to a manual request Jan 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Around line 791-809: The production image scan steps "Scan backend image" and
"Scan frontend image" currently use continue-on-error: true which allows
deployments to proceed despite critical findings; remove that line (or set it to
false) from both steps so the job fails on vulnerabilities, and add/adjust Trivy
inputs to enforce blocking severities (e.g., add a severity: CRITICAL,HIGH or
equivalent input to the trivy-action and ensure trivyignores only excludes
low/optional findings) so high/critical issues cause the action to exit
non-zero.
- Around line 8-9: Fix the typo in the workflow's paths-ignore entry: replace
the incorrect pattern "backend/data/nest.json.gzz" with the correct
"backend/data/nest.json.gz" under the paths-ignore key so the filter matches
changes to nest.json.gz and prevents unnecessary workflow runs.
- Around line 426-446: The image scan steps ("Scan backend image" and "Scan
frontend image") currently set continue-on-error: true while using exit-code: 1,
which logs but does not fail the workflow and lets vulnerable images pass;
remove or set continue-on-error to false so the step fails on detected
vulnerabilities (or change exit-code to 0 if intentional) and make all
image-scan steps consistent (ensure the same inputs are used across staging and
production scans); also remove the inconsistent version: latest from only the
staging frontend scan or add version: latest to all scans to unify behavior.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fea4b6 and 1a720ad.

📒 Files selected for processing (1)
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-26T06:09:08.868Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:09:08.868Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🔇 Additional comments (2)
.github/workflows/run-ci-cd.yaml (2)

147-147: LGTM!

Adding trivyignores configuration is a reasonable approach to manage known/accepted vulnerabilities consistently across scans.


1-26: Changes don't align with PR objectives.

The PR title and description mention "fix ignored timeout-minutes and add job-level timeouts," but the actual changes are:

  • A typo in paths-ignore (line 9)
  • Adding trivyignores to Trivy scans
  • Adding continue-on-error: true to security scans
  • Adding version: latest to one scan step

None of these appear related to fixing or adding timeout-minutes. Please verify this is the correct branch and that the intended timeout changes are included.

Comment on lines 8 to +9
paths-ignore:
- backend/data/nest.json.gz
- backend/data/nest.json.gzz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Typo in paths-ignore: nest.json.gzz should be nest.json.gz.

The extra z will cause this path pattern to never match, meaning pull requests that modify backend/data/nest.json.gz will incorrectly trigger the workflow. Line 15 has the correct pattern for the push trigger.

🐛 Proposed fix
     paths-ignore:
-      - backend/data/nest.json.gzz
+      - backend/data/nest.json.gz
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
paths-ignore:
- backend/data/nest.json.gz
- backend/data/nest.json.gzz
paths-ignore:
- backend/data/nest.json.gz
🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 8 - 9, Fix the typo in the
workflow's paths-ignore entry: replace the incorrect pattern
"backend/data/nest.json.gzz" with the correct "backend/data/nest.json.gz" under
the paths-ignore key so the filter matches changes to nest.json.gz and prevents
unnecessary workflow runs.

Comment on lines 426 to +446
- name: Scan backend image
continue-on-error: true
uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
with:
exit-code: 1
image-ref: owasp/nest:backend-staging
scan-type: image
skip-setup-trivy: true
trivy-config: trivy.yaml
trivyignores: trivyignore.yaml

- name: Scan frontend image
continue-on-error: true
uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
with:
exit-code: 1
image-ref: owasp/nest:frontend-staging
scan-type: image
skip-setup-trivy: true
trivy-config: trivy.yaml
trivyignores: trivyignore.yaml
version: latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/run-ci-cd.yaml | head -n 500

Repository: OWASP/Nest

Length of output: 20199


🏁 Script executed:

tail -n +500 .github/workflows/run-ci-cd.yaml | head -n 200

Repository: OWASP/Nest

Length of output: 7233


🏁 Script executed:

tail -n +700 .github/workflows/run-ci-cd.yaml

Repository: OWASP/Nest

Length of output: 12236


🏁 Script executed:

rg "trivy-action" .github/workflows/run-ci-cd.yaml -A 12

Repository: OWASP/Nest

Length of output: 2343


Security concern: continue-on-error: true with exit-code: 1 allows vulnerable images to reach production.

This pattern is present in all image scans (staging backend, staging frontend, production backend, production frontend). Vulnerability findings will be logged but won't block deployments, creating a significant security posture gap.

Additionally, version: latest appears only in the staging frontend scan, creating inconsistency across all other image scans.

🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 426 - 446, The image scan
steps ("Scan backend image" and "Scan frontend image") currently set
continue-on-error: true while using exit-code: 1, which logs but does not fail
the workflow and lets vulnerable images pass; remove or set continue-on-error to
false so the step fails on detected vulnerabilities (or change exit-code to 0 if
intentional) and make all image-scan steps consistent (ensure the same inputs
are used across staging and production scans); also remove the inconsistent
version: latest from only the staging frontend scan or add version: latest to
all scans to unify behavior.

Comment on lines 791 to +809
- name: Scan backend image
continue-on-error: true
uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
with:
exit-code: 1
image-ref: owasp/nest:backend-production
scan-type: image
skip-setup-trivy: true
trivy-config: trivy.yaml
trivyignores: trivyignore.yaml

- name: Scan frontend image
continue-on-error: true
uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
with:
exit-code: 1
image-ref: owasp/nest:frontend-production
scan-type: image
skip-setup-trivy: true
trivy-config: trivy.yaml
trivyignores: trivyignore.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Critical security concern: Production image scans with continue-on-error: true.

Allowing production deployments to proceed despite detected vulnerabilities is a significant security risk. Unlike staging, production directly impacts users and should enforce stricter security gates.

Consider removing continue-on-error: true for production scans, or at minimum, ensuring only low-severity vulnerabilities are ignored via trivyignores while critical/high findings still block the deployment.

🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 791 - 809, The production
image scan steps "Scan backend image" and "Scan frontend image" currently use
continue-on-error: true which allows deployments to proceed despite critical
findings; remove that line (or set it to false) from both steps so the job fails
on vulnerabilities, and add/adjust Trivy inputs to enforce blocking severities
(e.g., add a severity: CRITICAL,HIGH or equivalent input to the trivy-action and
ensure trivyignores only excludes low/optional findings) so high/critical issues
cause the action to exit non-zero.

@sonarqubecloud
Copy link
Copy Markdown

@arkid15r arkid15r closed this Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI reliability: Fix ignored timeout-minutes and add job-level timeouts to high-risk workflows

2 participants