Skip to content

Implement Query Analyzer for NestBot#4303

Open
rudransh-shrivastava wants to merge 9 commits intoOWASP:feature/nestbot-ai-assistantfrom
rudransh-shrivastava:feature/nestbot-ai-assistant-work
Open

Implement Query Analyzer for NestBot#4303
rudransh-shrivastava wants to merge 9 commits intoOWASP:feature/nestbot-ai-assistantfrom
rudransh-shrivastava:feature/nestbot-ai-assistant-work

Conversation

@rudransh-shrivastava
Copy link
Collaborator

@rudransh-shrivastava rudransh-shrivastava commented Mar 18, 2026

Proposed change

Resolves #4302
Note: semgrep scan fails due to outdated branch dependencies.
Implement query analyzer for nestbot.

image

Checklist

  • Required: I followed the contributing workflow
  • Required: I verified that my code works as intended and resolves the issue as described
  • Required: I ran make check-test locally: all warnings addressed, tests passed
  • I used AI for code, documentation, tests, or communication related to this PR

@rudransh-shrivastava rudransh-shrivastava linked an issue Mar 18, 2026 that may be closed by this pull request
2 tasks
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new AI-driven Query Analyzer module plus Jinja templates and integrates a single analysis call into the assistant flow; the assistant now logs the analysis (is_simple and sub_queries) but routing/control flow remain unchanged in this diff.

Changes

Cohort / File(s) Summary
Query Analyzer Module
backend/apps/ai/query_analyzer.py
New module defining AGENT_DESCRIPTIONS, create_query_analyzer_agent(), and analyze_query(query) which runs a CrewAI agent, parses model output into is_simple (bool) and sub_queries (list of {query, agent}), and falls back to a single rag sub-query if parsing yields none.
Assistant Integration
backend/apps/ai/flows/assistant.py
Imports and invokes analyze_query(query) after route(query) and logs the analysis via logger.info using structured extra fields containing is_simple, sub_queries, and the original query. No downstream routing or agent-selection logic is modified.
Prompt Templates
backend/apps/ai/templates/query_analyzer/backstory.jinja, backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja
New backstory and task Jinja templates: backstory frames the analyzer role and lists available agent names; task template instructs SIMPLE vs COMPLEX classification and prescribes output lines (IsSimple: and `SubQuery: ...
Dictionary Updates
cspell/custom-dict.txt
Adds two new dictionary entries: rudransh, shrivastava.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Implement Query Analyzer for NestBot' directly and clearly describes the main change: adding a query analyzer component, which aligns with the primary objective in linked issue #4302.
Linked Issues check ✅ Passed The PR implements the primary objective from #4302: adding a Query Analyzer component for NestBot with functionality to analyze queries, detect sub-queries, and determine required agents.
Out of Scope Changes check ✅ Passed Changes are limited to implementing the Query Analyzer (new module, templates, integration into assistant.py) and custom dictionary updates for author names; all are within scope of #4302's objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description clearly relates to the changeset, explaining the implementation of a query analyzer for NestBot and resolving issue #4302.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 5 files

Confidence score: 3/5

  • There is concrete medium risk here: backend/apps/ai/flows/assistant.py now makes an unconditional LLM analysis call for every query, which is likely to increase latency and cost without current functional upside.
  • The most severe correctness concern is in backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja, where raw query text is not clearly delimited as data, so prompt-injection content can steer analyzer output.
  • backend/apps/ai/query_analyzer.py appears fragile when parsing LLM output and can silently default to is_simple=True, which may mask format drift and cause misclassification once this output is relied on.
  • Pay close attention to backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja, backend/apps/ai/flows/assistant.py, and backend/apps/ai/query_analyzer.py - prompt safety, unconditional runtime cost, and brittle parsing/default behavior are the main merge risks.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja">

<violation number="1" location="backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja:1">
P1: User input is injected into the prompt without delimiting it as data, allowing prompt-injection instructions inside `query` to alter analyzer output.</violation>
</file>

<file name="backend/apps/ai/flows/assistant.py">

<violation number="1" location="backend/apps/ai/flows/assistant.py:109">
P1: This adds an unconditional LLM-based query analysis call, but its output is not used in control flow yet. It increases request latency/cost for every query with no functional benefit.</violation>
</file>

<file name="backend/apps/ai/query_analyzer.py">

<violation number="1" location="backend/apps/ai/query_analyzer.py:72">
P1: The analyzer output parsing is brittle and silently falls back to `is_simple=True` when the LLM format changes. Parse structured output (JSON) with validation before using defaults.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
backend/apps/ai/flows/assistant.py (1)

110-110: Use logger.info instead of logger.warning for successful analysis.

logger.warning is typically reserved for unexpected conditions or potential problems. A successful query analysis is a normal operational event and should use info or debug level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/apps/ai/flows/assistant.py` at line 110, Change the log level from
warning to info for successful query analysis by replacing the logger.warning
call with logger.info where the successful analysis is logged (the
logger.warning(...) invocation in assistant.py), so normal operational events
use info instead of warning; keep the same message text and context so only the
severity is changed.
backend/apps/ai/query_analyzer.py (2)

86-89: Consider validating required_agents against known agent names.

The parsed required_agents values are not validated against AGENT_DESCRIPTIONS.keys(). If the LLM returns an invalid or misspelled agent name, downstream code may fail or behave unexpectedly when these results are used for orchestration.

♻️ Proposed validation
         elif line_lower.startswith("requiredagents:"):
             value = line.split(":", 1)[1].strip().lower()
             if value and value != "none":
-                required_agents = [a.strip() for a in value.split(",") if a.strip()]
+                parsed_agents = [a.strip() for a in value.split(",") if a.strip()]
+                valid_agents = set(AGENT_DESCRIPTIONS.keys())
+                required_agents = [a for a in parsed_agents if a in valid_agents]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/apps/ai/query_analyzer.py` around lines 86 - 89, Validate the parsed
required_agents after the split by checking each normalized entry against the
known agent names in AGENT_DESCRIPTIONS.keys(); replace the current assignment
to required_agents with a filtered list that only includes valid agent names
(normalizing case or mapping as needed), and log or warn about any
invalid/misspelled entries (or raise a clear error) so downstream orchestration
code uses only recognized agents; reference the required_agents variable and
AGENT_DESCRIPTIONS.keys() in your change and perform the validation immediately
after the current parsing block in query_analyzer.py.

41-95: Add logging for observability of the analysis workflow.

The function performs an LLM call but doesn't log the raw result or parsing outcomes. This makes debugging difficult when the LLM returns unexpected output formats.

📝 Proposed logging additions
+import logging
+
+logger = logging.getLogger(__name__)
+
 def analyze_query(query: str) -> dict:
     ...
     result = crew.kickoff()
 
     result_str = str(result)
+    logger.debug("Raw analysis result: %s", result_str)
     is_simple = True
     ...
+    logger.debug(
+        "Parsed analysis: is_simple=%s, sub_queries=%s, required_agents=%s",
+        is_simple,
+        sub_queries,
+        required_agents,
+    )
     return {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/apps/ai/query_analyzer.py` around lines 41 - 95, In analyze_query,
add observability by logging the LLM/crew outputs and parsing outcomes: after
result = crew.kickoff() log the raw result (result and/or result_str) for
debugging, then after the parsing loop log the derived values is_simple,
sub_queries, and required_agents and any parse anomalies (e.g., unexpected
formats or missing fields); use the module logger (or existing logger instance)
to emit at INFO/DEBUG for normal values and WARN/ERROR for parse failures so you
can trace both the raw LLM response and the final parsed state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/apps/ai/flows/assistant.py`:
- Around line 108-118: The call to analyze_query(query) can raise and currently
sits unprotected, so wrap the analyze_query invocation and the subsequent
logger.warning usage in its own try/except block to prevent analysis failures
from aborting the main flow; on exception, catch and log the error (e.g., via
logger.exception or logger.warning with the exception) and set a safe default
for query_analysis (e.g., {"is_simple": False, "sub_queries": [],
"required_agents": []}) so the rest of assistant flow can continue using
query_analysis; update references to query_analysis in this scope (the
analyze_query call and the logger.warning block) accordingly.

In `@backend/apps/ai/query_analyzer.py`:
- Line 70: Wrap the call to crew.kickoff() in a try/except so failures (network,
rate limits, LLM timeouts, parse errors) don't propagate; on exception log the
error (including exception text) and return a safe default result (e.g., None,
empty list, or a Result object indicating failure) from the surrounding
function, and only re-raise if a caller needs to handle fatal cases; update the
line where result = crew.kickoff() to perform the guarded call and ensure
callers handle the chosen default.

---

Nitpick comments:
In `@backend/apps/ai/flows/assistant.py`:
- Line 110: Change the log level from warning to info for successful query
analysis by replacing the logger.warning call with logger.info where the
successful analysis is logged (the logger.warning(...) invocation in
assistant.py), so normal operational events use info instead of warning; keep
the same message text and context so only the severity is changed.

In `@backend/apps/ai/query_analyzer.py`:
- Around line 86-89: Validate the parsed required_agents after the split by
checking each normalized entry against the known agent names in
AGENT_DESCRIPTIONS.keys(); replace the current assignment to required_agents
with a filtered list that only includes valid agent names (normalizing case or
mapping as needed), and log or warn about any invalid/misspelled entries (or
raise a clear error) so downstream orchestration code uses only recognized
agents; reference the required_agents variable and AGENT_DESCRIPTIONS.keys() in
your change and perform the validation immediately after the current parsing
block in query_analyzer.py.
- Around line 41-95: In analyze_query, add observability by logging the LLM/crew
outputs and parsing outcomes: after result = crew.kickoff() log the raw result
(result and/or result_str) for debugging, then after the parsing loop log the
derived values is_simple, sub_queries, and required_agents and any parse
anomalies (e.g., unexpected formats or missing fields); use the module logger
(or existing logger instance) to emit at INFO/DEBUG for normal values and
WARN/ERROR for parse failures so you can trace both the raw LLM response and the
final parsed state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 910919b4-cbe1-4069-8a49-454abd87cb94

📥 Commits

Reviewing files that changed from the base of the PR and between 5a629da and 6df4888.

📒 Files selected for processing (5)
  • backend/apps/ai/flows/assistant.py
  • backend/apps/ai/query_analyzer.py
  • backend/apps/ai/templates/query_analyzer/backstory.jinja
  • backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja
  • cspell/custom-dict.txt

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
backend/apps/ai/query_analyzer.py (1)

79-80: ⚠️ Potential issue | 🟠 Major

Broaden crew.kickoff() failure handling beyond two exception types.

At Line 79, catching only RuntimeError and ValueError is narrow for LLM/network SDK failures; unhandled exceptions can still bubble and break request handling.

🛡️ Proposed fix
-    except (RuntimeError, ValueError):
+    except Exception:
         logger.exception("Query analysis failed for: %s", query)
         return {
             "is_simple": True,
             "sub_queries": [query],
             "required_agents": [],
         }
For CrewAI (current stable), what exception classes can Crew.kickoff() raise during model/network/tool failures, and is catching only RuntimeError/ValueError considered sufficient?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/apps/ai/query_analyzer.py` around lines 79 - 80, The except block for
crew.kickoff() is too narrow—replace the current except (RuntimeError,
ValueError): that wraps crew.kickoff() in query_analyzer.py with a broader
except Exception: to catch all runtime/SDK/network failures (but avoid catching
BaseException like KeyboardInterrupt/SystemExit); keep using
logger.exception("Query analysis failed for: %s", query) so the full traceback
and error details are logged, and ensure any necessary cleanup or controlled
re-raise happens after the log if callers need to see the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/apps/ai/query_analyzer.py`:
- Line 80: The analyzer is logging raw user queries (logger.exception("Query
analysis failed for: %s", query) and similar at lines 109-110); compute a
non-reversible fingerprint (e.g., SHA-256) of the query and the query length and
log those instead of the raw text. Update places that reference the query
variable in logging (e.g., logger.exception and any logger.error calls in this
module) to log something like "query_fingerprint=%s, query_length=%d" using the
hashed value and len(query); add an import for hashlib and a small helper (e.g.,
fingerprint_query(query)) in query_analyzer.py to produce the hex digest so you
don’t repeat hashing logic.
- Around line 98-102: The SubQueries parsing can yield an empty list for
delimiter-only or malformed values; inside the block that handles elif
line_lower.startswith("subqueries:") (where variables sub_queries and query are
used), after computing sub_queries = [q.strip() ...], add a fallback so that if
sub_queries is empty it is set to [query] to preserve the original single-query
execution plan instead of returning an empty list.

---

Duplicate comments:
In `@backend/apps/ai/query_analyzer.py`:
- Around line 79-80: The except block for crew.kickoff() is too narrow—replace
the current except (RuntimeError, ValueError): that wraps crew.kickoff() in
query_analyzer.py with a broader except Exception: to catch all
runtime/SDK/network failures (but avoid catching BaseException like
KeyboardInterrupt/SystemExit); keep using logger.exception("Query analysis
failed for: %s", query) so the full traceback and error details are logged, and
ensure any necessary cleanup or controlled re-raise happens after the log if
callers need to see the error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6bc725ff-79f4-4b30-9aec-119f72b1cd30

📥 Commits

Reviewing files that changed from the base of the PR and between 6df4888 and ff8919f.

📒 Files selected for processing (1)
  • backend/apps/ai/query_analyzer.py

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/apps/ai/query_analyzer.py">

<violation number="1" location="backend/apps/ai/query_analyzer.py:97">
P2: If the LLM returns malformed subquery output (e.g., `SubQueries: |` or `SubQueries: | |`), the list comprehension produces an empty list, overwriting the `[query]` default. This would result in an empty execution plan. Assign the parsed result to a temporary variable and only overwrite `sub_queries` if it's non-empty.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 18, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/apps/ai/query_analyzer.py">

<violation number="1" location="backend/apps/ai/query_analyzer.py:93">
P2: Add a test for the new fallback branch so `sub_queries` defaults to `[query]` when `SubQueries:` parses to an empty list.

(Based on your team's feedback about adding tests for new logic and bug fixes.) [FEEDBACK_USED]</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@rudransh-shrivastava rudransh-shrivastava marked this pull request as ready for review March 18, 2026 13:00
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 47.61905% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.45%. Comparing base (d6b2e85) to head (634b81e).

Files with missing lines Patch % Lines
backend/apps/ai/query_analyzer.py 46.15% 21 Missing ⚠️
backend/apps/ai/flows/assistant.py 66.66% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                       Coverage Diff                        @@
##           feature/nestbot-ai-assistant    #4303      +/-   ##
================================================================
- Coverage                         99.38%   96.45%   -2.94%     
================================================================
  Files                               520      551      +31     
  Lines                             16384    16987     +603     
  Branches                           2238     2371     +133     
================================================================
+ Hits                              16284    16384     +100     
- Misses                               62      560     +498     
- Partials                             38       43       +5     
Flag Coverage Δ
backend 95.86% <47.61%> (-3.97%) ⬇️
frontend 98.15% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
backend/apps/ai/flows/assistant.py 29.10% <66.66%> (ø)
backend/apps/ai/query_analyzer.py 46.15% <46.15%> (ø)

... and 51 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6b2e85...634b81e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
backend/apps/ai/query_analyzer.py (1)

101-103: ⚠️ Potential issue | 🟠 Major

Avoid logging raw query text (privacy leak).

Line 102 logs full user input. Please log a non-reversible fingerprint + length instead.

🔐 Proposed fix
+import hashlib
 import contextlib
 import logging
@@
 logger = logging.getLogger(__name__)
+
+
+def _query_fingerprint(query: str) -> tuple[str, int]:
+    return hashlib.sha256(query.encode("utf-8")).hexdigest()[:12], len(query)
@@
     if not required_agents:
-        logger.warning("Query analysis returned no valid agents for: %s", query)
+        query_hash, query_length = _query_fingerprint(query)
+        logger.warning(
+            "Query analysis returned no valid agents (query_hash=%s, query_length=%d)",
+            query_hash,
+            query_length,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/apps/ai/query_analyzer.py` around lines 101 - 103, The warning
currently logs the raw user query when no agents are returned; replace that to
avoid leaking sensitive input by computing a non-reversible fingerprint (e.g.,
SHA-256 hex digest of the query bytes) and using len(query) for length, then
call logger.warning with a message that includes only the fingerprint and length
(not the raw query). Update the code around the logger.warning in
query_analyzer.py where required_agents is checked (the block referencing
required_agents and logger.warning) to compute the hash/fingerprint and log
fingerprint + length instead of the query string.
🧹 Nitpick comments (1)
backend/apps/ai/query_analyzer.py (1)

84-99: Harden parsing before this drives orchestration.

Lines 84-99 depend on free-form text patterns. This is fragile for LLM format drift and can silently return defaults. Prefer strict structured output (e.g., JSON schema) and validate fields before returning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/apps/ai/query_analyzer.py` around lines 84 - 99, The current
free-form parsing loop in query_analyzer.py that extracts is_simple,
sub_queries, and required_agents from result_str is fragile; update
Analyze/parse logic to require and validate a strict structured output (e.g.,
JSON with keys "isSimple","subQueries","requiredAgents"), attempt to parse
result_str as JSON first, validate types and values (booleans, list of strings),
map/whitelist requiredAgents against AGENT_DESCRIPTIONS, and only fall back to
the existing line-based parsing if JSON parse fails while logging a warning;
ensure functions/variables referenced (is_simple, sub_queries, required_agents,
AGENT_DESCRIPTIONS, and the loop that inspects result_str) perform explicit
sanity checks and return clear errors or defaults instead of silently accepting
malformed input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/apps/ai/query_analyzer.py`:
- Around line 101-103: The warning currently logs the raw user query when no
agents are returned; replace that to avoid leaking sensitive input by computing
a non-reversible fingerprint (e.g., SHA-256 hex digest of the query bytes) and
using len(query) for length, then call logger.warning with a message that
includes only the fingerprint and length (not the raw query). Update the code
around the logger.warning in query_analyzer.py where required_agents is checked
(the block referencing required_agents and logger.warning) to compute the
hash/fingerprint and log fingerprint + length instead of the query string.

---

Nitpick comments:
In `@backend/apps/ai/query_analyzer.py`:
- Around line 84-99: The current free-form parsing loop in query_analyzer.py
that extracts is_simple, sub_queries, and required_agents from result_str is
fragile; update Analyze/parse logic to require and validate a strict structured
output (e.g., JSON with keys "isSimple","subQueries","requiredAgents"), attempt
to parse result_str as JSON first, validate types and values (booleans, list of
strings), map/whitelist requiredAgents against AGENT_DESCRIPTIONS, and only fall
back to the existing line-based parsing if JSON parse fails while logging a
warning; ensure functions/variables referenced (is_simple, sub_queries,
required_agents, AGENT_DESCRIPTIONS, and the loop that inspects result_str)
perform explicit sanity checks and return clear errors or defaults instead of
silently accepting malformed input.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b86e6e46-e119-4516-bd41-3de88abe5210

📥 Commits

Reviewing files that changed from the base of the PR and between 916ac4b and 0192183.

📒 Files selected for processing (2)
  • backend/apps/ai/query_analyzer.py
  • cspell/custom-dict.txt
✅ Files skipped from review due to trivial changes (1)
  • cspell/custom-dict.txt

@rudransh-shrivastava rudransh-shrivastava marked this pull request as draft March 19, 2026 12:43
@rudransh-shrivastava
Copy link
Collaborator Author

Updating the query analyzer output to be cleaner.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja">

<violation number="1" location="backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja:20">
P2: Clarify the `SubQuery` format for simple queries and exact agent names. As written, the model can omit the line for simple queries or return labels the parser rejects, which makes the analyzer fall back to `rag`.</violation>
</file>

<file name="backend/apps/ai/query_analyzer.py">

<violation number="1" location="backend/apps/ai/query_analyzer.py:99">
P2: Don't force simple queries into a fabricated `rag` sub-query when the analyzer returns no `SubQuery:` lines.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/apps/ai/query_analyzer.py (1)

98-100: ⚠️ Potential issue | 🟠 Major

Avoid logging raw user query text in warning logs.

Line 99 logs raw query, which is a privacy/compliance leak risk. Log a fingerprint + length instead.

🔐 Proposed fix
+import hashlib
@@
 def analyze_query(query: str) -> dict:
@@
+    query_hash = hashlib.sha256(query.encode("utf-8")).hexdigest()[:12]
@@
     if not sub_queries:
-        logger.warning("Query analysis returned no valid sub-queries for: %s", query)
+        logger.warning(
+            "Query analysis returned no valid sub-queries (query_hash=%s, query_length=%d)",
+            query_hash,
+            len(query),
+        )
         sub_queries = [{"query": query, "agent": "rag"}]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/apps/ai/query_analyzer.py` around lines 98 - 100, The warning in
query_analyzer.py currently logs the raw user `query` when `sub_queries` is
empty; instead compute a safe fingerprint and length and log those; replace the
logger.warning call that references `query` (the branch that sets `sub_queries =
[{"query": query, "agent": "rag"}]`) with code that computes a deterministic
hash (e.g., SHA-256 hex digest of query.encode('utf-8')) and the character count
(len(query or "")) and log something like "Query analysis returned no valid
sub-queries for fingerprint=%s length=%d" with the fingerprint and length,
handling None/empty queries gracefully.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/apps/ai/query_analyzer.py`:
- Around line 83-97: The current parser in query_analyzer.py (looping over
result_str, using line_lower.startswith and only accepting inline "| agent:") is
brittle and misses outputs like separate "SubQueries:" / "RequiredAgents:" lines
or variants of casing/spacing; update the parsing logic to accept both
"subquery(s):" and "requiredagent(s):" (case/whitespace-insensitive), handle
both inline forms ("| agent:") and multi-line forms (capture a subquery line
into a temp variable and attach the next agent line if it matches
AGENT_DESCRIPTIONS), and ensure existing variables (is_simple, sub_queries,
AGENT_DESCRIPTIONS, result_str) are used; keep backwards-compatible behavior so
inline parsing still works and ignore malformed entries with contextlib.suppress
as before.

---

Duplicate comments:
In `@backend/apps/ai/query_analyzer.py`:
- Around line 98-100: The warning in query_analyzer.py currently logs the raw
user `query` when `sub_queries` is empty; instead compute a safe fingerprint and
length and log those; replace the logger.warning call that references `query`
(the branch that sets `sub_queries = [{"query": query, "agent": "rag"}]`) with
code that computes a deterministic hash (e.g., SHA-256 hex digest of
query.encode('utf-8')) and the character count (len(query or "")) and log
something like "Query analysis returned no valid sub-queries for fingerprint=%s
length=%d" with the fingerprint and length, handling None/empty queries
gracefully.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 011f1569-5b99-4469-8192-e10e0e2e1086

📥 Commits

Reviewing files that changed from the base of the PR and between 0192183 and 8c1734c.

📒 Files selected for processing (3)
  • backend/apps/ai/flows/assistant.py
  • backend/apps/ai/query_analyzer.py
  • backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja
✅ Files skipped from review due to trivial changes (1)
  • backend/apps/ai/templates/query_analyzer/tasks/analyze.jinja
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/apps/ai/flows/assistant.py

cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Mar 19, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 19, 2026
@sonarqubecloud
Copy link

@rudransh-shrivastava rudransh-shrivastava marked this pull request as ready for review March 19, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Query Analyzer for NestBot

1 participant