Skip to content

Sync www-repopsitories#2164

Merged
arkid15r merged 11 commits intoOWASP:feature/nestbot-assistantfrom
Dishant1804:www-repository
Sep 18, 2025
Merged

Sync www-repopsitories#2164
arkid15r merged 11 commits intoOWASP:feature/nestbot-assistantfrom
Dishant1804:www-repository

Conversation

@Dishant1804
Copy link
Contributor

Proposed change

Resolves #2144

  • create context for www- repositories
  • create chunks for the same
  • test covergae for it

Checklist

  • I've read and followed the contributing guidelines.
  • I've run make check-test locally; all checks and tests passed.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 1, 2025

Summary by CodeRabbit

  • New Features

    • Added commands to refresh repository context and content used by AI features.
  • Improvements

    • Enhanced repository content extraction with richer metadata and Markdown (README and tab files).
    • Increased text chunk size and overlap for improved retrieval quality.
    • Smarter handling of JSON vs. prose and skipping empty content during indexing.
    • Added gentle pacing for GitHub requests to reduce rate-limit issues.
  • Tests

    • Comprehensive tests for repository extraction, chunking behavior, and command workflows.

Walkthrough

Adds repository content extraction and two Django management commands (chunks & context) with Makefile targets; increases chunk size/overlap; makes chunking JSON-aware and skips empty content; adds GitHub request pacing and JSON validation util; includes tests and a placeholder test module.

Changes

Cohort / File(s) Summary
Makefile targets
backend/apps/ai/Makefile
Adds ai-update-repository-chunks and ai-update-repository-context targets that run corresponding Django management commands via exec-backend-command.
Repository extractor
backend/apps/ai/common/extractors/repository.py
New extract_repository_content(repository) -> tuple[str, str] that builds repository metadata, fetches Markdown files (README, tab_*.md, etc.) from raw GitHub URLs with request pacing and error handling, validates content, and returns JSON content plus metadata string.
Management commands (repository)
backend/apps/ai/management/commands/ai_update_repository_chunks.py, backend/apps/ai/management/commands/ai_update_repository_context.py
New commands extending BaseChunkCommand / BaseContextCommand for Repository with key_field_name="key", filter OWASP site repos (is_owasp_site_repository=True, is_archived=False, is_empty=False), delegate to the repository extractor, and set source_name="owasp_repository".
Chunk splitting config
backend/apps/ai/models/chunk.py
Increases Chunk.split_text parameters: chunk_size 300 → 500 and chunk_overlap 40 → 80.
Chunk command flow
backend/apps/ai/common/base/chunk_command.py
Imports is_valid_json; treats valid JSON as full content (bypass metadata concatenation); concatenates metadata and content otherwise; skips entities when full_content is empty (adds continue).
Constants & utils
backend/apps/ai/common/constants.py, backend/apps/common/utils.py
Adds GITHUB_REQUEST_INTERVAL_SECONDS = 0.5 and is_valid_json(content: str) -> bool (imports json).
Tests & placeholder
backend/test_commands.py, backend/tests/apps/ai/common/extractors/repository_test.py, backend/tests/apps/ai/common/utils_test.py, backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py, backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py, backend/tests/apps/ai/models/chunk_test.py
Adds comprehensive tests for repository extraction and new commands, updates utils and chunk tests to cover new error/error-path behavior, and adds a placeholder backend/test_commands.py.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested labels

nestbot

Suggested reviewers

  • kasya
  • arkid15r

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning Most changes are in-scope (extractor, commands, Makefile targets, tests, and the GitHub request interval constant) but the modification of Chunk.split_text parameters (chunk_size 300→500 and chunk_overlap 40→80) is a global behavioral change not required by the linked issue and therefore appears out-of-scope for this PR. Revert the global chunking-parameter changes from this PR and submit them in a separate, focused PR that documents and justifies the chunking-strategy change (with tests demonstrating impact), or restrict the new chunking parameters to the repository-chunking path and clearly explain the rationale in the PR description.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "Sync www-repopsitories" is directly related to the main change (syncing/processing OWASP "www-" repositories) so it conveys the primary intent, but it contains a spelling error ("repopsitories") and is terse; a clearer, correctly spelled title would help reviewers quickly understand scope.
Linked Issues Check ✅ Passed The PR adds extract_repository_content to fetch Markdown, introduces ai_update_repository_chunks and ai_update_repository_context management commands, adds Makefile targets to run them, and includes tests — collectively addressing the linked issue's objectives to fetch .md content from OWASP www- repositories and index it for NestBot; the implementation filters repositories via is_owasp_site_repository and related flags rather than an explicit "www-" name-prefix, so that mapping should be confirmed.
Description Check ✅ Passed The PR description references issue #2144, summarizes the proposed change (create contexts and chunks for www- repositories), notes verification steps and checklist completion, and therefore is directly related to the changeset and passes this lenient description check.
Docstring Coverage ✅ Passed Docstring coverage is 87.88% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db219b3 and 23ee4ca.

📒 Files selected for processing (2)
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
docker-compose/local.yaml (1)

25-25: Bug: backend service sets DJANGO_REDIS_PASSWORD from DJANGO_REDIS_HOST.

This will set the password to the host value when DJANGO_REDIS_HOST is defined.

-      DJANGO_REDIS_PASSWORD: ${DJANGO_REDIS_HOST:-nest-cache-password}
+      DJANGO_REDIS_PASSWORD: ${DJANGO_REDIS_PASSWORD:-nest-cache-password}
backend/apps/ai/models/chunk.py (1)

85-93: Handle race with unique constraint in update_data().

Concurrent writers can still insert the same (context, text) between the exists() check and save(), raising IntegrityError. Catch and return None to match the method contract.

-        if save:
-            chunk.save()
+        if save:
+            from django.db import IntegrityError
+            try:
+                chunk.save()
+            except IntegrityError:
+                return None
backend/apps/slack/events/__init__.py (1)

1-20: Make event configuration idempotent and skip heavy imports when Slack app is disabled

Prevents duplicate registrations and avoids importing event modules if tokens are missing.

+_EVENTS_CONFIGURED = False
+
 def configure_slack_events():
     """Configure Slack events after Django apps are ready."""
-    from apps.slack.apps import SlackConfig
-    from apps.slack.events import (
-        app_home_opened,
-        message_posted,
-        team_join,
-        url_verification,
-    )
-    from apps.slack.events.event import EventBase
-    from apps.slack.events.member_joined_channel import (
-        catch_all,
-        contribute,
-        gsoc,
-        project_nest,
-    )
-
-    if SlackConfig.app:
-        EventBase.configure_events()
+    from apps.slack.apps import SlackConfig
+    global _EVENTS_CONFIGURED
+    if not SlackConfig.app or _EVENTS_CONFIGURED:
+        return
+    from apps.slack.events import (
+        app_home_opened,
+        message_posted,
+        team_join,
+        url_verification,
+    )
+    from apps.slack.events.event import EventBase
+    from apps.slack.events.member_joined_channel import (
+        catch_all,
+        contribute,
+        gsoc,
+        project_nest,
+    )
+    EventBase.configure_events()
+    _EVENTS_CONFIGURED = True
backend/tests/apps/ai/common/extractors/repository_test.py (1)

434-455: Log message lacks repository context expected by tests

Tests assert the debug log contains test-repo. Update extractor logging to include repo key/owner/path as shown above.

🧹 Nitpick comments (48)
backend/test_commands.py (1)

1-1: Avoid stray test_ module outside tests/*

Pytest will collect this empty module, adding noise. Either remove it or rename/move under a non-collected path (e.g., backend/commands_testutils.py) or into tests/ with actual tests.

backend/settings/base.py (1)

145-152: Harden RQ config: prefer URL form and make timeout configurable.

Using a single Redis URL avoids partial env mismatches and mirrors your cache config; exposing timeout via env helps ops tuning.

Apply within this block:

-        "default": {
-            "HOST": REDIS_HOST,
-            "PORT": 6379,
-            "PASSWORD": REDIS_PASSWORD,
-            "DEFAULT_TIMEOUT": 360,
-        }
+        "default": {
+            # Use URL form for parity with CACHES and simpler overrides.
+            "URL": f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:6379/0",
+            # Make tunable via env; 360s default preserved.
+            "DEFAULT_TIMEOUT": values.IntegerValue(
+                environ_name="RQ_DEFAULT_TIMEOUT",
+                default=360,
+            ),
+        }

Optionally consider adding:

  • RQ exception handlers that report to Sentry.
  • Separate queues (e.g., "ai", "slack") with different timeouts.
backend/tests/apps/ai/common/utils_test.py (1)

108-116: Add a minimal assertion to lock invocation count.

Verifies we don’t loop on failures.

         result = create_chunks_and_embeddings(
             chunk_texts=["some text"],
             context=None,
             openai_client=mock_openai_client,
         )
 
         mock_logger.exception.assert_called_once_with("Failed to create chunks and embeddings")
+        mock_update_data.assert_called_once()
         assert result == []
backend/apps/slack/constants.py (1)

25-80: Use frozenset for OWASP_KEYWORDS and prune overly generic entries
Convert OWASP_KEYWORDS to a frozenset to prevent accidental mutation; remove broad terms (e.g. “security”, “project”, “event”) to cut down on false positives—matching is already case-insensitive via text.lower(). Evaluate the false-positive rate on a representative Slack log before enabling auto-replies.

backend/apps/ai/models/chunk.py (1)

5-5: Also import IntegrityError.

If you adopt the changes above, add IntegrityError to imports.

-from django.db import DatabaseError, models
+from django.db import DatabaseError, IntegrityError, models
backend/tests/apps/ai/models/chunk_test.py (4)

31-42: Assert list-clearing contract after bulk_save.

If bulk_save is expected to clear the input list (as BulkSaveModel does), assert it to prevent regressions.

             with patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk_save:
                 Chunk.bulk_save(mock_chunks)
                 mock_bulk_save.assert_called_once_with(Chunk, mock_chunks, fields=None)
+                assert mock_chunks == []

43-55: Mirror the clearing assertion for the fields path.

Keep expectations consistent across both code paths.

             with patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk_save:
                 Chunk.bulk_save(mock_chunks, fields=fields)
                 mock_bulk_save.assert_called_once_with(Chunk, mock_chunks, fields=fields)
+                assert mock_chunks == []

65-86: Remove redundant @patch on Chunk.save.

You replace Chunk with a Mock later; the decorator patch becomes moot and can confuse readers.

-@patch("apps.ai.models.chunk.Chunk.save")
-def test_update_data_save_with_context(self, mock_save):
+def test_update_data_save_with_context(self):

31-55: Add a test ensuring updates aren’t filtered out.

Guard against regressions where chunks with ids (updates) get dropped by existence filtering.

Proposed new test:

def test_bulk_save_includes_updates():
    c1 = Mock(spec=Chunk); c1.id = 42; c1.context = Mock(); c1.text = "t1"
    c2 = Mock(spec=Chunk); c2.id = None; c2.context = Mock(); c2.text = "t2"
    with patch("apps.ai.models.chunk.Chunk.objects.filter") as mock_filter, \
         patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk:
        mock_filter.return_value.exists.return_value = False
        chunks = [c1, c2]
        Chunk.bulk_save(chunks)
        # c1 (update) must still be passed through
        mock_bulk.assert_called_once()
        passed = mock_bulk.call_args.args[1]
        assert c1 in passed and c2 in passed

Want me to push this into the suite?

backend/apps/ai/common/extractors/repository.py (5)

130-146: Broaden markdown coverage (common variants).

Consider additional common filenames/paths to increase hit rate on OWASP www repos (case and format variants).

     markdown_files = [
         "README.md",
+        "README.rst",
+        "README.adoc",
+        "readme.md",
+        "README",
         "CONTRIBUTING.md",
         "CODE_OF_CONDUCT.md",
         "SECURITY.md",
         "CHANGELOG.md",
-        "LICENSE.md",
+        "LICENSE.md",
+        "LICENSE",
+        "docs/README.md",
         "docs/README.md",
         "docs/index.md",
         "docs/overview.md",
         "docs/getting-started.md",
         "docs/installation.md",
         "docs/usage.md",
         "docs/api.md",
         "docs/development.md",
         "docs/deployment.md",
+        "index.md",
+        "content/index.md",
+        "content/README.md",
+        "site/README.md",
     ]

156-160: Branch fallback for site repos.

If default_branch is missing, try gh-pages then master before main to better handle www repos.

-                raw_url = (
-                    f"https://raw.githubusercontent.com/{owner}/{repository.key}/"
-                    f"{repository.default_branch or 'main'}/{file_path}"
-                )
+                branch = (
+                    repository.default_branch
+                    or ("gh-pages" if getattr(repository, "has_pages", False) else None)
+                    or "master"
+                    or "main"
+                )
+                raw_url = f"https://raw.githubusercontent.com/{owner}/{repository.key}/{branch}/{file_path}"

167-169: Improve logging with context.

Log which file/owner/repo failed and keep traceback; current message loses diagnostics.

-        except (ValueError, TypeError, OSError, ConnectionError):
-            logger.debug("Failed to fetch")
+        except (ValueError, TypeError, OSError, ConnectionError):
+            logger.debug(
+                "Failed to fetch markdown file",
+                extra={"file_path": file_path, "owner": owner, "repo": getattr(repository, "key", None)},
+                exc_info=True,
+            )
             continue

148-166: Surface fetched files in metadata (and drop unused variable otherwise).

Either remove fetched_files or expose it so downstream context can reference what was pulled.

-    fetched_files = []
+    fetched_files = []
@@
-                if content and content.strip():
-                    fetched_files.append(file_path)
+                if content and content.strip():
+                    fetched_files.append(file_path)
                     prose_parts.append(f"## {file_path}\n\n{content}")
@@
     return (
-        DELIMITER.join(filter(None, prose_parts)),
-        DELIMITER.join(filter(None, metadata_parts)),
+        DELIMITER.join(filter(None, prose_parts)),
+        DELIMITER.join(
+            filter(
+                None,
+                metadata_parts
+                + (
+                    [f"Fetched Markdown Files: {', '.join(fetched_files)}"]
+                    if fetched_files
+                    else []
+                ),
+            )
+        ),
     )

Also applies to: 171-174


148-166: Consider rate limits and 404 noise from raw URLs.

Fetching many files per repo can hit GitHub rate limits and raw 404s may return HTML. Prefer GitHub API for contents (with If-None-Match) or add status checks in get_repository_file_content to return "" on non-200.

I can provide an API-backed extractor variant and harden get_repository_file_content to ignore non-200 responses. Want a follow-up PR?

backend/apps/slack/admin/conversation.py (1)

30-31: Expose the new flag in filters for quicker triage

Adding the boolean to list_filter eases admin workflows without altering data model.

Apply outside this hunk:

@@
     list_filter = (
         "sync_messages",
         "is_archived",
         "is_channel",
         "is_general",
         "is_im",
         "is_private",
+        "is_nest_bot_assistant_enabled",
     )
backend/apps/slack/commands/ai.py (2)

21-23: Avoid KeyError when Slack omits text; use a safe default.
Use .get("text") or "" to guard against missing/None and prevent an AttributeError on .strip().

-        return get_blocks(
-            query=command["text"].strip(),
-        )
+        return get_blocks(
+            query=(command.get("text") or "").strip(),
+        )

19-20: Document the local import intent.
Add a short comment so future readers know this is deliberate for lazy-loading/cycle avoidance.

-        from apps.slack.common.handlers.ai import get_blocks
+        # Local import to avoid circular import and reduce import-time overhead.
+        from apps.slack.common.handlers.ai import get_blocks
backend/apps/ai/Makefile (1)

37-44: Declare PHONY targets for the new commands.
Prevents make from treating similarly named files as up-to-date targets.

 ai-update-repository-context:
 	@echo "Updating repository context"
 	@CMD="python manage.py ai_update_repository_context" $(MAKE) exec-backend-command
 
+.PHONY: ai-update-repository-chunks ai-update-repository-context
backend/apps/ai/management/commands/ai_update_repository_context.py (1)

18-28: Eager-load FKs used by the extractor.
extract_repository_markdown_content accesses organization.login and owner.login; add select_related to avoid N+1 queries during batch runs.

         return (
             super()
             .get_base_queryset()
             .filter(
                 is_owasp_site_repository=True,
                 is_archived=False,
                 is_empty=False,
             )
-        )
+        ).select_related("organization", "owner")
backend/apps/slack/services/message_auto_reply.py (1)

33-35: Race window note (optional).
Small race exists between the replies check and posting; acceptable, but consider an idempotency key or re-check on 409s if this becomes noisy.

backend/apps/slack/common/handlers/ai.py (4)

16-19: Fix docstring: remove undocumented parameter.
The presentation argument isn’t in the signature.

-        presentation (EntityPresentation | None): Configuration for entity presentation.

26-28: Guard against oversized Slack block text.
Slack mrkdwn text is limited (~3000 chars per section). Chunk long responses into multiple blocks.

-    if ai_response:
-        return [markdown(ai_response)]
+    if ai_response:
+        max_len = 2900
+        if len(ai_response) <= max_len:
+            return [markdown(ai_response)]
+        return [markdown(ai_response[i : i + max_len]) for i in range(0, len(ai_response), max_len)]

31-46: Make models configurable via settings.
Avoid hard-coding model IDs; read from Django settings with sane defaults.

+    from django.conf import settings
     rag_tool = RagTool(
-        chat_model="gpt-4o",
-        embedding_model="text-embedding-3-small",
+        chat_model=getattr(settings, "AI_CHAT_MODEL", "gpt-4o"),
+        embedding_model=getattr(settings, "AI_EMBEDDING_MODEL", "text-embedding-3-small"),
     )

10-10: Unused logger.
Either use logger for diagnostics or remove it.

backend/apps/slack/MANIFEST.yaml (2)

111-124: Tighten bot scopes (least privilege).

Line 123 adds channels:history, which is needed for message.channels. Line 122 (channels:manage) appears unnecessary for this feature set—drop it if unused.

       - users:read
-      - groups:write
-      - channels:manage
+      - groups:write
+      # Removed channels:manage (not required for read/AI reply workflows)
       - channels:history

131-134: Validate event coverage vs. intended surfaces.

message.channels covers public channels only. If replies are desired in private channels/DMs, add message.groups and message.im and corresponding groups:history/im:history scopes; otherwise consider app_mention to reduce noise.

backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py (1)

11-17: Field OK; consider index and default rollout.

  • If this flag is frequently filtered, add db_index=True or a model Index.
  • Default False will disable the assistant for all existing conversations. Confirm that’s intended or add a data migration to enable for selected workspaces/channels.
-            field=models.BooleanField(default=False, verbose_name="Is Nest Bot Assistant Enabled"),
+            field=models.BooleanField(
+                default=False,
+                verbose_name="Is Nest Bot Assistant Enabled",
+                db_index=True,
+            ),
backend/apps/ai/management/commands/ai_update_repository_chunks.py (1)

18-28: Avoid N+1 on owner/org during extraction.

The extractor touches repository.organization/owner; select_related will prevent per-row FK hits.

     def get_base_queryset(self) -> QuerySet:
         """Return the base queryset with filtering for OWASP site repositories."""
         return (
             super()
             .get_base_queryset()
+            .select_related("organization", "owner")
             .filter(
                 is_owasp_site_repository=True,
                 is_archived=False,
                 is_empty=False,
             )
         )
backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (3)

45-47: Fix pluralization to “repositories”.

Current expectation hardcodes a misspelling (“repositorys”) and bakes it into UX.

-        assert command.entity_name_plural == "repositorys"
+        assert command.entity_name_plural == "repositories"

Outside this file, consider deriving from model meta to avoid manual pluralization:

# In BaseAICommand (or similar)
@property
def entity_name_plural(self) -> str:
    return getattr(self.model_class._meta, "verbose_name_plural", f"{self.entity_name}s")

67-82: Patch target is brittle.

Patching command.class.bases[0].get_base_queryset relies on MRO order. Patch the concrete base symbol instead.

-        with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super:
+        from apps.ai.common.base.chunk_command import BaseChunkCommand
+        with patch.object(BaseChunkCommand, "get_base_queryset") as mock_super:

104-119: Remove duplicated test logic.

This repeats test_get_base_queryset. Drop to reduce noise.

-    def test_queryset_filtering_logic(self, command):
-        """Test that the queryset filtering logic works correctly."""
-        with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super:
-            mock_queryset = Mock()
-            mock_super.return_value = mock_queryset
-            mock_queryset.filter.return_value = "filtered_queryset"
-
-            result = command.get_base_queryset()
-
-            mock_queryset.filter.assert_called_once_with(
-                is_owasp_site_repository=True,
-                is_archived=False,
-                is_empty=False,
-            )
-            assert result == "filtered_queryset"
+    # Removed: duplicate of test_get_base_queryset
backend/tests/apps/slack/common/handlers/ai_test.py (2)

75-83: Avoid hard-coding model IDs; read from settings.

Make chat/embedding model names configurable; update test to assert against settings-derived values.

-        mock_rag_tool.assert_called_once_with(
-            chat_model="gpt-4o",
-            embedding_model="text-embedding-3-small",
-        )
+        from django.conf import settings
+        mock_rag_tool.assert_called_once_with(
+            chat_model=getattr(settings, "SLACK_AI_CHAT_MODEL", "gpt-4o"),
+            embedding_model=getattr(settings, "SLACK_AI_EMBEDDING_MODEL", "text-embedding-3-small"),
+        )

Outside this file (handler):

from django.conf import settings
rag_tool = RagTool(
    chat_model=getattr(settings, "SLACK_AI_CHAT_MODEL", "gpt-4o"),
    embedding_model=getattr(settings, "SLACK_AI_EMBEDDING_MODEL", "text-embedding-3-small"),
)

132-143: Add a guard for blank-only queries.

Avoid calling the RAG pipeline when the stripped query is empty; return error blocks directly and add a test.

+    def test_get_blocks_with_blank_query(self):
+        with patch("apps.slack.common.handlers.ai.get_error_blocks") as mock_err:
+            mock_err.return_value = [{"type": "section"}]
+            with patch("apps.slack.common.handlers.ai.process_ai_query") as mock_proc:
+                result = get_blocks("   ")
+                mock_proc.assert_not_called()
+                assert result == [{"type": "section"}]
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (2)

67-83: Replace brittle base-class patching with import-path patching.

Using bases[0] is fragile. Patch the known symbol instead.

-        with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super:
+        with patch("apps.ai.common.base.context_command.BaseContextCommand.get_base_queryset") as mock_super:
             mock_queryset = Mock()
             mock_super.return_value = mock_queryset
             mock_queryset.filter.return_value = "filtered_queryset"

Repeat the same replacement in the other tests that patch the base queryset.

Also applies to: 104-119, 127-141, 142-156, 157-171


127-171: DRY up the three queryset-filter tests.

They assert the same filter arguments. Parametrize into one test to reduce noise and speed up the suite.

If helpful, I can provide a parametrized version.

backend/tests/apps/slack/services/message_auto_reply_test.py (3)

48-96: Also assert the retrieval call contract.

Validate we call Message.objects.get with pk for added confidence.

         generate_ai_reply_if_unanswered(mock_message.id)
 
+        mock_message_get.assert_called_once_with(pk=mock_message.id)
         mock_client.conversations_replies.assert_called_once_with(
             channel=mock_message.conversation.slack_channel_id,
             ts=mock_message.slack_message_id,
             limit=1,
         )

97-106: Add assertion for not-found path.

Confirms correct lookup argument.

         generate_ai_reply_if_unanswered(99999)
 
+        mock_message_get.assert_called_once_with(pk=99999)

172-211: Merge duplicate “no/empty AI response” tests via parametrization.

Reduces repetition while preserving coverage.

-    @patch("apps.slack.services.message_auto_reply.Message.objects.get")
-    @patch("apps.slack.services.message_auto_reply.WebClient")
-    @patch("apps.slack.services.message_auto_reply.process_ai_query")
-    def test_generate_ai_reply_no_ai_response(
-        self,
-        mock_process_ai_query,
-        mock_webclient,
-        mock_message_get,
-        mock_message,
-    ):
-        """Test when AI doesn't return a response."""
-        mock_message_get.return_value = mock_message
-        mock_client = Mock()
-        mock_webclient.return_value = mock_client
-        mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]}
-        mock_process_ai_query.return_value = None
-
-        generate_ai_reply_if_unanswered(mock_message.id)
-
-        mock_client.chat_postMessage.assert_not_called()
-
-    @patch("apps.slack.services.message_auto_reply.Message.objects.get")
-    @patch("apps.slack.services.message_auto_reply.WebClient")
-    @patch("apps.slack.services.message_auto_reply.process_ai_query")
-    def test_generate_ai_reply_empty_response(
-        self,
-        mock_process_ai_query,
-        mock_webclient,
-        mock_message_get,
-        mock_message,
-    ):
-        """Test when AI returns empty response."""
-        mock_message_get.return_value = mock_message
-        mock_client = Mock()
-        mock_webclient.return_value = mock_client
-        mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]}
-        mock_process_ai_query.return_value = ""
-
-        generate_ai_reply_if_unanswered(mock_message.id)
-
-        mock_client.chat_postMessage.assert_not_called()
+    @pytest.mark.parametrize("ai_response", [None, ""])
+    @patch("apps.slack.services.message_auto_reply.Message.objects.get")
+    @patch("apps.slack.services.message_auto_reply.WebClient")
+    @patch("apps.slack.services.message_auto_reply.process_ai_query")
+    def test_generate_ai_reply_no_or_empty_ai_response(
+        self, mock_process_ai_query, mock_webclient, mock_message_get, mock_message, ai_response
+    ):
+        """Test when AI returns no/empty response."""
+        mock_message_get.return_value = mock_message
+        mock_client = Mock()
+        mock_webclient.return_value = mock_client
+        mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]}
+        mock_process_ai_query.return_value = ai_response
+
+        generate_ai_reply_if_unanswered(mock_message.id)
+
+        mock_client.chat_postMessage.assert_not_called()
backend/tests/apps/slack/common/question_detector_test.py (3)

18-23: Avoid brittle assertions on pattern counts

Asserting exact counts for question_patterns/compiled_patterns couples tests to implementation details.

-        assert len(detector.question_patterns) == 4
-        assert len(detector.compiled_patterns) == 4
+        assert len(detector.question_patterns) >= 4
+        assert len(detector.compiled_patterns) >= 4

49-64: Confirm intent: treating “advice/recommend” statements as questions

Tests expect declaratives like “I recommend this approach” to be flagged as questions. If that’s not intended, constrain these patterns (e.g., require a question mark or leading question terms).


144-156: Reduce reliance on pattern order

Index-based checks (compiled_patterns[0/1]) are fragile. Prefer matching by pattern content or behavior.

-        question_mark_pattern = detector.compiled_patterns[0]
-        assert question_mark_pattern.search("Is this right?") is not None
+        assert any(p.pattern == r"\?" and p.search("Is this right?") for p in detector.compiled_patterns)

-        question_word_pattern = detector.compiled_patterns[1]
-        assert question_word_pattern.search("What is this") is not None
-        assert question_word_pattern.search("How does it work") is not None
+        expected = r"^(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)"
+        assert any(p.pattern == expected and p.search("What is this") for p in detector.compiled_patterns)
+        assert any(p.pattern == expected and p.search("How does it work") for p in detector.compiled_patterns)
backend/apps/slack/common/question_detector.py (3)

16-19: Normalize keywords to lowercase once

Ensure case-insensitive matching regardless of how OWASP_KEYWORDS is defined.

-        self.owasp_keywords = OWASP_KEYWORDS
+        self.owasp_keywords = {k.lower() for k in OWASP_KEYWORDS}

20-25: Tighten regex to reduce false positives

Add word-boundaries and allow optional leading whitespace for question starters; keeps behavior but trims accidental matches like “whatsoever”.

-            r"^(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)",
-            r"(help|explain|tell me|show me|guide|tutorial|example)",
-            r"(recommend|suggest|advice|opinion)",
+            r"^\s*\b(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)\b",
+            r"\b(help|explain|tell me|show me|guide|tutorial|example)\b",
+            r"\b(recommend|suggest|advice|opinion)\b",

48-57: Boundary-aware phrase matching for multi-word keywords

Substring matching can overmatch. Consider boundary-aware search that ignores punctuation/hyphens.

# Example enhancement:
def contains_owasp_keywords(self, text: str) -> bool:
    words = re.findall(r"\b\w+\b", text)
    text_words = set(words)
    if self.owasp_keywords.intersection(text_words):
        return True

    # Multi-word phrases: collapse whitespace/punctuation and match by tokens
    normalized = re.sub(r"[\s\-_]+", " ", text)
    for keyword in self.owasp_keywords:
        if " " in keyword and re.search(rf"\b{re.escape(keyword)}\b", normalized):
            return True
    return False
backend/tests/apps/slack/commands/ai_test.py (1)

13-17: Remove unused autouse fixture or reuse it

You recreate Ai() in tests; either reuse self.ai_command or drop the autouse fixture.

-    @pytest.fixture(autouse=True)
-    def setup_method(self):
-        """Set up test data before each test method."""
-        self.ai_command = Ai()
+    @pytest.fixture
+    def ai_command(self):
+        """Ai instance per test."""
+        return Ai()

Then use ai_command in tests.

backend/tests/apps/ai/common/extractors/repository_test.py (1)

397-405: Use call.args for robust URL assertions

Stringifying call is brittle. Inspect the first positional arg.

-        assert any(
-            "https://raw.githubusercontent.com/test-org/test-repo/develop/" in str(call)
-            for call in mock_get_content.call_args_list
-        )
+        assert any(
+            "https://raw.githubusercontent.com/test-org/test-repo/develop/" in c.args[0]
+            for c in mock_get_content.call_args_list
+        )
backend/tests/apps/slack/events/message_posted_test.py (2)

95-123: Add a case for mid-message mentions (optional)

If mentions anywhere should trigger AI reply, add a test where the mention is not at the start (e.g., “Hi <@u987654> what is OWASP?”) and confirm the stripped query is correct. If only leading mentions should trigger, assert non-trigger instead.


95-123: Consider guarding against None blocks from get_blocks

If get_blocks can return None, chat_postMessage may fail. Either assert in tests that we skip posting when None, or handle it in the event handler.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7cd6d44 and a1f1b70.

⛔ Files ignored due to path filters (1)
  • backend/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (36)
  • backend/apps/ai/Makefile (1 hunks)
  • backend/apps/ai/common/extractors/repository.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_context.py (1 hunks)
  • backend/apps/ai/models/chunk.py (3 hunks)
  • backend/apps/slack/MANIFEST.yaml (3 hunks)
  • backend/apps/slack/admin/conversation.py (1 hunks)
  • backend/apps/slack/apps.py (1 hunks)
  • backend/apps/slack/commands/__init__.py (1 hunks)
  • backend/apps/slack/commands/ai.py (1 hunks)
  • backend/apps/slack/common/handlers/ai.py (1 hunks)
  • backend/apps/slack/common/question_detector.py (1 hunks)
  • backend/apps/slack/constants.py (1 hunks)
  • backend/apps/slack/events/__init__.py (1 hunks)
  • backend/apps/slack/events/message_posted.py (1 hunks)
  • backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py (1 hunks)
  • backend/apps/slack/models/conversation.py (1 hunks)
  • backend/apps/slack/services/__init__.py (1 hunks)
  • backend/apps/slack/services/message_auto_reply.py (1 hunks)
  • backend/apps/slack/templates/commands/ai.jinja (1 hunks)
  • backend/pyproject.toml (1 hunks)
  • backend/settings/base.py (2 hunks)
  • backend/settings/urls.py (1 hunks)
  • backend/test_commands.py (1 hunks)
  • backend/tests/apps/ai/common/extractors/repository_test.py (1 hunks)
  • backend/tests/apps/ai/common/utils_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (1 hunks)
  • backend/tests/apps/ai/models/chunk_test.py (4 hunks)
  • backend/tests/apps/slack/commands/ai_test.py (1 hunks)
  • backend/tests/apps/slack/common/handlers/ai_test.py (1 hunks)
  • backend/tests/apps/slack/common/question_detector_test.py (1 hunks)
  • backend/tests/apps/slack/events/message_posted_test.py (1 hunks)
  • backend/tests/apps/slack/services/message_auto_reply_test.py (1 hunks)
  • cspell/custom-dict.txt (2 hunks)
  • docker-compose/local.yaml (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (20)
backend/apps/slack/apps.py (2)
backend/apps/mentorship/apps.py (1)
  • ready (12-14)
backend/apps/slack/events/__init__.py (1)
  • configure_slack_events (1-19)
backend/apps/slack/common/handlers/ai.py (2)
backend/apps/ai/agent/tools/rag/rag_tool.py (2)
  • RagTool (13-63)
  • query (34-63)
backend/apps/slack/blocks.py (1)
  • markdown (21-34)
backend/tests/apps/slack/common/handlers/ai_test.py (2)
backend/apps/slack/common/handlers/ai.py (2)
  • get_error_blocks (49-61)
  • process_ai_query (31-46)
backend/apps/ai/agent/tools/rag/rag_tool.py (1)
  • query (34-63)
backend/apps/ai/management/commands/ai_update_repository_context.py (4)
backend/apps/ai/common/base/context_command.py (1)
  • BaseContextCommand (9-54)
backend/apps/ai/common/extractors/repository.py (1)
  • extract_repository_markdown_content (114-174)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (5)
  • Command (10-36)
  • extract_content (14-16)
  • get_base_queryset (18-28)
  • get_default_queryset (30-32)
  • source_name (34-36)
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
  • Command (10-36)
  • source_name (34-36)
  • extract_content (14-16)
  • get_base_queryset (18-28)
  • get_default_queryset (30-32)
backend/apps/ai/common/base/context_command.py (1)
  • BaseContextCommand (9-54)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
backend/apps/slack/commands/ai.py (2)
backend/apps/slack/commands/command.py (1)
  • CommandBase (20-157)
backend/apps/slack/common/handlers/ai.py (1)
  • get_blocks (13-28)
backend/apps/slack/events/message_posted.py (7)
backend/apps/slack/common/handlers/ai.py (1)
  • get_blocks (13-28)
backend/apps/slack/common/question_detector.py (2)
  • QuestionDetector (13-57)
  • is_owasp_question (31-42)
backend/apps/slack/events/event.py (1)
  • EventBase (23-265)
backend/apps/slack/models/conversation.py (2)
  • Conversation (15-107)
  • update_data (85-107)
backend/apps/slack/models/member.py (1)
  • Member (10-78)
backend/apps/slack/models/message.py (2)
  • Message (15-160)
  • text (83-85)
backend/apps/slack/services/message_auto_reply.py (1)
  • generate_ai_reply_if_unanswered (16-48)
backend/apps/slack/common/question_detector.py (1)
backend/apps/slack/models/message.py (1)
  • text (83-85)
backend/tests/apps/slack/services/message_auto_reply_test.py (4)
backend/apps/slack/models/conversation.py (1)
  • Conversation (15-107)
backend/apps/slack/models/message.py (3)
  • Message (15-160)
  • text (83-85)
  • ts (88-90)
backend/apps/slack/models/workspace.py (2)
  • Workspace (11-44)
  • bot_token (37-44)
backend/apps/slack/services/message_auto_reply.py (1)
  • generate_ai_reply_if_unanswered (16-48)
backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (5)
  • Command (10-36)
  • source_name (34-36)
  • extract_content (14-16)
  • get_base_queryset (18-28)
  • get_default_queryset (30-32)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
backend/apps/slack/services/message_auto_reply.py (4)
backend/apps/slack/common/handlers/ai.py (1)
  • process_ai_query (31-46)
backend/apps/slack/models/message.py (3)
  • Message (15-160)
  • ts (88-90)
  • text (83-85)
backend/apps/slack/models/workspace.py (1)
  • bot_token (37-44)
backend/apps/ai/agent/tools/rag/rag_tool.py (1)
  • query (34-63)
backend/apps/ai/common/extractors/repository.py (1)
backend/apps/github/utils.py (1)
  • get_repository_file_content (60-79)
backend/apps/slack/events/__init__.py (2)
backend/apps/slack/apps.py (1)
  • SlackConfig (13-33)
backend/apps/slack/events/event.py (2)
  • EventBase (23-265)
  • configure_events (30-37)
backend/tests/apps/ai/models/chunk_test.py (3)
backend/tests/apps/ai/common/base/chunk_command_test.py (2)
  • mock_chunks (72-81)
  • mock_context (51-60)
backend/apps/ai/models/chunk.py (2)
  • Chunk (16-93)
  • bulk_save (34-53)
backend/tests/apps/ai/common/base/context_command_test.py (1)
  • mock_context (48-55)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/ai/common/extractors/repository.py (1)
  • extract_repository_markdown_content (114-174)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
  • Command (10-36)
  • extract_content (14-16)
  • get_base_queryset (18-28)
  • get_default_queryset (30-32)
  • source_name (34-36)
backend/tests/apps/slack/common/question_detector_test.py (2)
backend/apps/slack/common/question_detector.py (4)
  • QuestionDetector (13-57)
  • is_question (44-46)
  • contains_owasp_keywords (48-57)
  • is_owasp_question (31-42)
backend/tests/apps/slack/events/message_posted_test.py (1)
  • test_init (44-48)
backend/tests/apps/slack/events/message_posted_test.py (3)
backend/apps/slack/events/message_posted.py (1)
  • MessagePosted (17-77)
backend/apps/slack/models/conversation.py (1)
  • Conversation (15-107)
backend/apps/slack/models/member.py (1)
  • Member (10-78)
backend/apps/ai/models/chunk.py (3)
backend/apps/ai/models/context.py (1)
  • Context (15-61)
backend/apps/common/models.py (2)
  • BulkSaveModel (10-34)
  • TimestampedModel (37-46)
backend/apps/common/utils.py (1)
  • truncate (164-176)
backend/tests/apps/slack/commands/ai_test.py (2)
backend/apps/slack/commands/ai.py (1)
  • Ai (6-23)
backend/apps/slack/commands/command.py (1)
  • CommandBase (20-157)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (2)
  • extract_repository_content (11-111)
  • extract_repository_markdown_content (114-174)
🔇 Additional comments (26)
backend/pyproject.toml (1)

22-22: Manually verify django-rq compatibility with Python 3.13 and Django 5.1
Poetry wasn’t available to run the compatibility checks; please confirm at runtime that django-rq 3.1 (and its rq/redis transitive dependencies) works under Python 3.13 and Django 5.1, and pin rq/redis in pyproject.toml if needed.

cspell/custom-dict.txt (1)

109-109: Dictionary additions OK

“rqworker” and “webgoat” entries make sense for this repo’s domain.

Also applies to: 125-125

backend/apps/slack/services/__init__.py (1)

1-1: Package initializer LGTM

Clear, minimal package docstring. No issues.

backend/settings/urls.py (1)

31-31: Django-RQ dashboard is staff-only by default
Unconditional path("django-rq/", include("django_rq.urls")) is safe since django-rq v3.1+ enforces staff-only access on its dashboard views.

backend/settings/base.py (1)

52-53: Django RQ app registration looks correct.

Adding "django_rq" to THIRD_PARTY_APPS is appropriate and consistent with how INSTALLED_APPS is composed.

backend/tests/apps/ai/common/utils_test.py (1)

97-107: Decorator order and side-effect wiring are correct.

Patching order matches argument order; the AttributeError simulates the None-context path as intended.

backend/apps/ai/models/chunk.py (2)

34-53: No downstream code depends on chunks being cleared. Chunk.bulk_save is only called in chunk_command.py (where the list isn’t used after the call) and in tests (which don’t assert on the list), so the updated clear-only-when-empty logic doesn’t break any callers.


19-23: Replace deprecated unique_together with a UniqueConstraint and plan a safe migration

 class Meta:
     db_table = "ai_chunks"
     verbose_name = "Chunk"
-    unique_together = ("context", "text")
+    constraints = [
+        models.UniqueConstraint(
+            fields=("context", "text"),
+            name="uq_ai_chunks_context_text",
+        )
+    ]

Add a data migration to dedupe existing rows before enforcing the constraint and use CREATE INDEX CONCURRENTLY on Postgres to avoid downtime.

backend/tests/apps/ai/models/chunk_test.py (1)

101-104: LGTM on direct context-based existence checks.

The assertions correctly verify the switch to context=mock_context, text=text.

Also applies to: 125-128, 148-150

backend/apps/ai/common/extractors/repository.py (1)

108-111: LGTM on structured repository metadata extraction.

The prose/metadata split and DELIMITER usage align with the RAG pipeline.

backend/apps/slack/commands/__init__.py (1)

5-5: LGTM: /ai command is now exported/loaded

Importing ai ensures command discovery via CommandBase.configure_commands().

backend/apps/slack/models/conversation.py (1)

30-32: LGTM: opt-in feature flag on Conversation

Good that from_slack() does not overwrite this local flag.

backend/apps/ai/management/commands/ai_update_repository_context.py (1)

30-32: Upstream get_base_queryset has no is_active filter BaseAICommand.get_base_queryset returns .objects.all() with no implicit filters, so overriding get_default_queryset correctly avoids filtering on the nonexistent is_active field.

backend/apps/slack/MANIFEST.yaml (1)

98-103: /ai slash command looks correct.

URL, usage, and escaping align with the existing pattern. Ensure the handler responds within 3s to Slack’s slash command ACK.

backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py (1)

1-1: Django version compatibility confirmed
The project’s pyproject.toml declares django = "^5.1" (i.e. ≥5.1,<6.0), which includes Django 5.2.5, so the migration header is consistent with the dependency range.

backend/apps/ai/management/commands/ai_update_repository_chunks.py (2)

14-17: Good: delegates extraction cleanly.


34-36: Consistent source_name.

Matches context command; good for downstream filters.

backend/tests/apps/slack/services/message_auto_reply_test.py (1)

17-43: Fixtures are clean and focused. LGTM.

backend/tests/apps/slack/common/question_detector_test.py (2)

170-186: Good real-world coverage

Nice parameterized cases that exercise diverse OWASP queries.


188-196: Initialization patching looks solid

Mocking OWASP_KEYWORDS verifies configurability without leaking global state.

backend/tests/apps/slack/commands/ai_test.py (2)

18-42: Patch target and whitespace trimming behavior look correct

Verifies the local import usage and exact query passed to get_blocks.


127-140: Exception propagation test is appropriate

Confirms render_blocks does not swallow errors, leaving caller to handle.

backend/tests/apps/ai/common/extractors/repository_test.py (1)

241-256: Baseline behavior verified

Description is included; metadata intentionally empty for markdown extraction. Looks good.

backend/tests/apps/slack/events/message_posted_test.py (3)

95-123: Bot-mention flow is well covered

Auth caching, mention stripping, and block rendering checks are solid.


210-292: OWASP question happy-path coverage looks good

Validates DB updates and delayed enqueue to the default queue.


348-375: Whitespace-trim behavior verified

Ensures query cleanup before rendering blocks. Nice.

pypoetry
pyyaml
repositorycontributor
repositorys
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Don’t whitelist the misspelling “repositorys”

This risks masking user-facing typos. Prefer fixing sources to “repositories”. If you must keep it temporarily for tests, add a TODO and plan removal.

Apply if ready to correct:

-repositorys
+repositories

To find any user-visible occurrences:


🏁 Script executed:

#!/bin/bash
rg -nP -C2 '\brepositorys\b' --glob '!*cspell*'

Length of output: 2363


Implement correct pluralization and remove whitelist entry
Update the entity_name_plural logic to return “repositories” (instead of naïvely appending “s”), update the two test files under backend/tests/apps/ai/management/commands/ where they assert "repositorys" to expect "repositories", then remove the “repositorys” entry from cspell/custom-dict.txt (or add a TODO if you need a temporary fallback).

🤖 Prompt for AI Agents
In cspell/custom-dict.txt around line 107 the entry "repositorys" exists but is
the incorrect plural; update code to make entity_name_plural return
"repositories" (not by naive append "s"), update the two tests in
backend/tests/apps/ai/management/commands/ to expect "repositories" instead of
"repositorys", and then remove the "repositorys" entry from
cspell/custom-dict.txt (or replace it with a TODO comment if you need a
temporary fallback).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dishant1804 this is legit suggestion, it's grammatically incorrect and needs to be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@Dishant1804 Dishant1804 force-pushed the www-repository branch 2 times, most recently from 6439df7 to b198ecf Compare September 6, 2025 17:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/apps/ai/models/chunk.py (1)

85-93: update_data has a race on duplicates; rely on the DB constraint

The pre-check can race; catch IntegrityError instead and return None.

-        if Chunk.objects.filter(context=context, text=text).exists():
-            return None
-
-        chunk = Chunk(text=text, embedding=embedding, context=context)
-
-        if save:
-            chunk.save()
-
-        return chunk
+        chunk = Chunk(text=text, embedding=embedding, context=context)
+        if save:
+            try:
+                chunk.save()
+            except Exception as exc:  # narrowed below via local import
+                from django.db import IntegrityError
+                if isinstance(exc, IntegrityError):
+                    # Duplicate per unique (context, text); treat as no-op
+                    logger.debug("Chunk already exists for context_id=%s; skipping", getattr(context, "id", None))
+                    return None
+                raise
+        return chunk

If you prefer top-level imports, add:

-from django.db import DatabaseError, models
+from django.db import DatabaseError, IntegrityError, models
♻️ Duplicate comments (2)
backend/apps/ai/models/chunk.py (1)

35-53: bulk_save drops updates, does N+1 queries, and breaks the clear-list contract

  • Updates are filtered out (ids present) if an identical (context, text) exists.
  • Per-object .exists() causes O(N) queries.
  • Passing a new list to BulkSaveModel means the caller’s list isn’t cleared unless no uniques remain.
  • Duplicate races aren’t handled; rely on the DB unique constraint and catch IntegrityError.

Replace the body with a single-probe, race-safe implementation that preserves the contract.

     @staticmethod
     def bulk_save(chunks, fields=None):
-        """Bulk save chunks with duplicate handling."""
-        unique_chunks = []
-        unique_chunks = [
-            chunk
-            for chunk in chunks
-            if not Chunk.objects.filter(context=chunk.context, text=chunk.text).exists()
-        ]
-
-        if unique_chunks:
-            try:
-                BulkSaveModel.bulk_save(Chunk, unique_chunks, fields=fields)
-            except (ValueError, TypeError, DatabaseError):
-                for chunk in unique_chunks:
-                    try:
-                        chunk.save()
-                    except (ValueError, TypeError, DatabaseError) as save_error:
-                        logger.exception("Failed to save chunk", exc_info=save_error)
-        else:
-            chunks.clear()
+        """Bulk save chunks with duplicate handling and minimal queries."""
+        if not chunks:
+            return
+
+        # Partition: updates vs. creates
+        to_update = [c for c in chunks if getattr(c, "id", None)]
+        to_create = [c for c in chunks if not getattr(c, "id", None)]
+
+        # In-memory dedupe among creates by (context_id, text)
+        seen: set[tuple[int | None, str]] = set()
+        deduped_create = []
+        for c in to_create:
+            ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None)
+            key = (ctx_id, c.text)
+            if key in seen:
+                continue
+            seen.add(key)
+            deduped_create.append(c)
+
+        # One DB probe for existing keys among incoming creates
+        ctx_ids = {getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) for c in deduped_create}
+        texts = {c.text for c in deduped_create}
+        existing_keys: set[tuple[int, str]] = set()
+        if ctx_ids and texts:
+            existing_keys = set(
+                Chunk.objects.filter(context_id__in=ctx_ids, text__in=texts)
+                .values_list("context_id", "text")
+            )
+
+        final_create = []
+        for c in deduped_create:
+            ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None)
+            if (ctx_id, c.text) not in existing_keys:
+                final_create.append(c)
+
+        unique_chunks = to_update + final_create
+        if not unique_chunks:
+            chunks.clear()
+            return
+
+        try:
+            # Maintain contract: operate on caller list so BulkSaveModel clears it
+            chunks[:] = unique_chunks
+            BulkSaveModel.bulk_save(Chunk, chunks, fields=fields)
+        except (ValueError, TypeError, DatabaseError) as exc:
+            # Fallback: per-object saves with race-safe duplicate handling
+            from django.db import IntegrityError  # local import to avoid global if unused
+            for obj in unique_chunks:
+                try:
+                    if getattr(obj, "id", None) and fields:
+                        obj.save(update_fields=fields)
+                    else:
+                        obj.save()
+                except IntegrityError:
+                    # Duplicate create due to race/uniqueness; ignore
+                    continue
+                except (ValueError, TypeError, DatabaseError):
+                    logger.exception("Failed to save chunk id=%s", getattr(obj, "id", None))
backend/tests/apps/ai/common/extractors/repository_test.py (1)

340-354: Add a test for generic exceptions from the fetcher

Past feedback asked to catch generic Exception in the extractor; add a case that raises Exception to ensure we don’t break on unexpected errors.

     @patch("apps.ai.common.extractors.repository.get_repository_file_content")
     def test_extract_repository_markdown_content_file_fetch_exception(self, mock_get_content):
         """Test extraction when file fetching raises an exception."""
         repository = MagicMock()
         repository.description = "Test repository"
         repository.organization = MagicMock(login="test-org")
         repository.key = "test-repo"
         repository.default_branch = "main"

-        mock_get_content.side_effect = ConnectionError("Network error")
+        mock_get_content.side_effect = Exception("Network error")

         prose, metadata = extract_repository_markdown_content(repository)

         assert "Repository Description: Test repository" in prose
         assert metadata == ""
🧹 Nitpick comments (5)
backend/apps/ai/models/chunk.py (2)

50-51: Use logger.exception correctly

logger.exception already includes traceback; passing exc_info object is unnecessary and atypical.

-                        logger.exception("Failed to save chunk", exc_info=save_error)
+                        logger.exception("Failed to save chunk id=%s", getattr(chunk, "id", None))

19-23: Prefer UniqueConstraint over unique_together

Django now favors explicit UniqueConstraint with a name; easier to reference in migrations and logs.

-        unique_together = ("context", "text")
+        constraints = [
+            models.UniqueConstraint(
+                fields=["context", "text"],
+                name="uniq_ai_chunk_context_text",
+            )
+        ]
backend/apps/ai/management/commands/ai_update_repository_chunks.py (1)

18-28: Optimize queryset for extractor access

Select related FKs to avoid extra queries when building raw URLs (owner/org), and narrow fields if desired.

         return (
             super()
             .get_base_queryset()
             .filter(
                 is_owasp_site_repository=True,
                 is_archived=False,
                 is_empty=False,
             )
-        )
+        ).select_related("organization", "owner")
backend/tests/apps/ai/common/extractors/repository_test.py (2)

407-420: Assert no network calls when owner/org missing

When both are absent, extractor should not attempt fetches. Patch and assert not called.

-    def test_extract_repository_markdown_content_no_owner_or_org(self):
+    @patch("apps.ai.common.extractors.repository.get_repository_file_content")
+    def test_extract_repository_markdown_content_no_owner_or_org(self, mock_get_content):
         """Test extraction when repository has neither organization nor owner."""
         repository = MagicMock()
         repository.description = "Test repository"
         repository.organization = None
         repository.owner = None
         repository.key = "test-repo"
         repository.default_branch = "main"

         prose, metadata = extract_repository_markdown_content(repository)

         assert "Repository Description: Test repository" in prose
         assert metadata == ""
+        mock_get_content.assert_not_called()

421-433: Also assert no network calls when key is missing

Key is required to form the raw URL; ensure no fetch attempted.

-    def test_extract_repository_markdown_content_no_key(self):
+    @patch("apps.ai.common.extractors.repository.get_repository_file_content")
+    def test_extract_repository_markdown_content_no_key(self, mock_get_content):
         """Test extraction when repository has no key."""
         repository = MagicMock()
         repository.description = "Test repository"
         repository.organization = MagicMock(login="test-org")
         repository.key = None
         repository.default_branch = "main"

         prose, metadata = extract_repository_markdown_content(repository)

         assert "Repository Description: Test repository" in prose
         assert metadata == ""
+        mock_get_content.assert_not_called()
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1f1b70 and 103fb24.

📒 Files selected for processing (12)
  • backend/apps/ai/Makefile (1 hunks)
  • backend/apps/ai/common/extractors/repository.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_context.py (1 hunks)
  • backend/apps/ai/models/chunk.py (3 hunks)
  • backend/test_commands.py (1 hunks)
  • backend/tests/apps/ai/common/extractors/repository_test.py (1 hunks)
  • backend/tests/apps/ai/common/utils_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (1 hunks)
  • backend/tests/apps/ai/models/chunk_test.py (4 hunks)
  • cspell/custom-dict.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • backend/test_commands.py
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
  • backend/tests/apps/ai/models/chunk_test.py
  • backend/apps/ai/management/commands/ai_update_repository_context.py
  • backend/apps/ai/common/extractors/repository.py
  • cspell/custom-dict.txt
  • backend/apps/ai/Makefile
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
  • backend/tests/apps/ai/common/utils_test.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (2)
  • extract_repository_content (11-111)
  • extract_repository_markdown_content (114-174)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/ai/common/extractors/repository.py (1)
  • extract_repository_markdown_content (114-174)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
  • Command (10-36)
  • extract_content (14-16)
  • get_base_queryset (18-28)
  • get_default_queryset (30-32)
  • source_name (34-36)
backend/apps/ai/models/chunk.py (2)
backend/apps/ai/models/context.py (1)
  • Context (15-61)
backend/apps/common/models.py (2)
  • BulkSaveModel (10-34)
  • TimestampedModel (37-46)
🔇 Additional comments (5)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)

10-12: LGTM: command wiring is correct

key_field_name and model_class look right.


14-16: LGTM: delegates to the markdown extractor

Extraction path aligns with the PR goals.


30-36: LGTM: default queryset override avoids is_active

Repository lacks is_active; good override.


18-28: Verified OWASP site repository filter uses www- prefixes
check_owasp_site_repository keys off key.startswith(("www-chapter-", "www-committee-", "www-event", "www-project-")), so the queryset filter aligns with the intended www- heuristic.

backend/tests/apps/ai/common/extractors/repository_test.py (1)

3-3: CI/python requirements support datetime.UTC
Project’s pyproject.toml specifies python = "^3.13" (>= 3.13), which includes the datetime.UTC alias introduced in Python 3.11—no action needed.

@Dishant1804 Dishant1804 marked this pull request as ready for review September 6, 2025 19:27
Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dishant1804 you've submitted 1000+ lines of code for review.
I have a couple of questions:

  • could you explain what OWASP Nest problem this PR solves?
  • how did you verify that problem is solved with this PR?

@Dishant1804
Copy link
Contributor Author

@Dishant1804 you've submitted 1000+ lines of code for review. I have a couple of questions:

  • could you explain what OWASP Nest problem this PR solves?
  • how did you verify that problem is solved with this PR?

this PR gets the .md files that we have in www- repositories and saves it chunks and context so that we can have more knowledge
I checked the admin panel for chunks and context, also ran it on some example queries

@arkid15r
Copy link
Collaborator

this PR gets the .md files that we have in www- repositories and saves it chunks and context so that we can have more knowledge I checked the admin panel for chunks and context, also ran it on some example queries

Alright, so it sound like you checked it with at least one repository and some relevant queries. Can you provide some examples?

@Dishant1804
Copy link
Contributor Author

this PR gets the .md files that we have in www- repositories and saves it chunks and context so that we can have more knowledge I checked the admin panel for chunks and context, also ran it on some example queries

Alright, so it sound like you checked it with at least one repository and some relevant queries. Can you provide some examples?

this has a lot of chunks that are not up to the mark, maybe we should change the chunking delimiter or chunking strategy for this case, because it has lot of delimiters

@arkid15r
Copy link
Collaborator

this has a lot of chunks that are not up to the mark, maybe we should change the chunking delimiter or chunking strategy for this case, because it has lot of delimiters

I meant your general approach for testing, e.g. did OWASP Nest repository content help to improve NestBot answers/context for any specific questions? Did it get new info based on this?

My opinion -- it didn't. I haven't tested it yet but based on this code I can reasonably assume that no new [useful] content was added by these jobs and PR in general. Also I reasonably assume that this is an AI generated code that has nothing to do with real life scenario. I call it a cosplay-code.

The bottom line is that code has 0 value in its current form but you requested mentors to review it w/o testing. I don't appreciate this kind of disrespect to mentors' time. This is not acceptable.

@Dishant1804
Copy link
Contributor Author

this has a lot of chunks that are not up to the mark, maybe we should change the chunking delimiter or chunking strategy for this case, because it has lot of delimiters

I meant your general approach for testing, e.g. did OWASP Nest repository content help to improve NestBot answers/context for any specific questions? Did it get new info based on this?

My opinion -- it didn't. I haven't tested it yet but based on this code I can reasonably assume that no new [useful] content was added by these jobs and PR in general. Also I reasonably assume that this is an AI generated code that has nothing to do with real life scenario. I call it a cosplay-code.

The bottom line is that code has 0 value in its current form but you requested mentors to review it w/o testing. I don't appreciate this kind of disrespect to mentors' time. This is not acceptable.

I'll update the chunking startegy for this and then request for review, we are getting the information from readme and contributing .md files, the chunking needs to be proper

@Dishant1804 Dishant1804 marked this pull request as draft September 11, 2025 12:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
backend/apps/ai/models/chunk.py (1)

35-53: bulk_save drops updates, does N+1 queries, and breaks caller-list clearing contract

  • Updates are filtered out (self-existence check catches the same row).
  • Per-object .exists() → O(N) queries.
  • Caller’s chunks list isn’t cleared on success (only when no uniques).
  • Fallback logging misuses logger.exception args.

Refactor as below to partition update/create, dedupe creates in-memory, probe DB once, maintain contract, and handle IntegrityError.

     @staticmethod
     def bulk_save(chunks, fields=None):
-        """Bulk save chunks with duplicate handling."""
-        unique_chunks = []
-        unique_chunks = [
-            chunk
-            for chunk in chunks
-            if not Chunk.objects.filter(context=chunk.context, text=chunk.text).exists()
-        ]
-
-        if unique_chunks:
-            try:
-                BulkSaveModel.bulk_save(Chunk, unique_chunks, fields=fields)
-            except (ValueError, TypeError, DatabaseError):
-                for chunk in unique_chunks:
-                    try:
-                        chunk.save()
-                    except (ValueError, TypeError, DatabaseError) as save_error:
-                        logger.exception("Failed to save chunk", exc_info=save_error)
-        else:
-            chunks.clear()
+        """Bulk save chunks with duplicate handling and minimal queries."""
+        if not chunks:
+            return
+
+        # Partition: updates (have id) vs creates (no id)
+        to_update = [c for c in chunks if getattr(c, "id", None)]
+        to_create = [c for c in chunks if not getattr(c, "id", None)]
+
+        # In-memory dedupe among creates by (context_id, text)
+        seen: set[tuple[int | None, str]] = set()
+        deduped_create = []
+        for c in to_create:
+            ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None)
+            key = (ctx_id, c.text)
+            if key in seen:
+                continue
+            seen.add(key)
+            deduped_create.append(c)
+
+        # One DB round-trip to find existing keys
+        ctx_ids = {getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) for c in deduped_create}
+        texts = {c.text for c in deduped_create}
+        existing_keys: set[tuple[int, str]] = set()
+        if ctx_ids and texts:
+            existing_keys = set(
+                Chunk.objects.filter(context_id__in=ctx_ids, text__in=texts)
+                .values_list("context_id", "text")
+            )
+
+        final_create = [
+            c for c in deduped_create
+            if ((getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None)), c.text) not in existing_keys
+        ]
+
+        unique_chunks = to_update + final_create
+        if not unique_chunks:
+            chunks.clear()
+            return
+
+        try:
+            # Maintain contract: operate on caller's list and let BulkSaveModel clear it
+            chunks[:] = unique_chunks
+            BulkSaveModel.bulk_save(Chunk, chunks, fields=fields)
+        except (ValueError, TypeError, DatabaseError):
+            # Fallback to per-object saves; ignore duplicate races
+            from django.db import IntegrityError  # local import
+            for obj in unique_chunks:
+                try:
+                    if getattr(obj, "id", None) and fields:
+                        obj.save(update_fields=fields)
+                    else:
+                        obj.save()
+                except IntegrityError:
+                    continue
+                except (ValueError, TypeError, DatabaseError):
+                    logger.exception("Failed to save chunk id=%s", getattr(obj, "id", None))
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (3)

45-47: Fix plural: "repositorys" → "repositories".

This is user-facing and appears in multiple assertions.

Apply:

-        assert command.entity_name_plural == "repositorys"
+        assert command.entity_name_plural == "repositories"

120-126: Fix plural in initialization assertions as well.

Keep tests consistent with correct English plural.

-        assert command.entity_name_plural == "repositorys"
+        assert command.entity_name_plural == "repositories"

41-47: Define entity_name and entity_name_plural on the command class.

Prevents accidental defaults and ensures help/log strings are correct.

Change in apps/ai/management/commands/ai_update_repository_context.py:

 class Command(BaseContextCommand):
+    entity_name = "repository"
+    entity_name_plural = "repositories"
     key_field_name = "key"
     model_class = Repository
🧹 Nitpick comments (14)
backend/apps/ai/common/utils.py (1)

74-89: Avoid full JSON parse for detection; add fast guards

json.loads on large strings is costly. Short-circuit on type and first non-space char before parsing; then parse trimmed string.

 def is_json_content(content: str) -> bool:
@@
-    try:
-        json.loads(content)
+    if not isinstance(content, str):
+        return False
+    s = content.lstrip()
+    if not s or s[0] not in ("{", "["):
+        return False
+    try:
+        json.loads(s)
backend/apps/ai/models/chunk.py (3)

59-63: Make splitter markdown-aware for better chunk boundaries

Include heading-based separators to reduce mid-heading splits and improve retrieval quality.

-            separators=["\n\n", "\n", " ", ""],
+            separators=["\n\n", "\n### ", "\n## ", "\n# ", "\n", " ", ""],

13-13: Type-hint logger for consistency

Minor consistency nit with utils.py.

-logger = logging.getLogger(__name__)
+logger: logging.Logger = logging.getLogger(__name__)

85-93: Race-safe create in update_data

Potential duplicate race against the unique constraint; catch IntegrityError and return None to keep the contract.

         chunk = Chunk(text=text, embedding=embedding, context=context)
 
         if save:
-            chunk.save()
+            from django.db import IntegrityError  # local import
+            try:
+                chunk.save()
+            except IntegrityError:
+                return None
 
         return chunk
backend/apps/ai/common/base/chunk_command.py (1)

46-53: Embedding raw JSON hurts recall; include metadata prefix even for JSON

Keep human-readable context to improve retrieval without discarding structured data.

-                if is_json_content(content):
-                    full_content = content
+                if is_json_content(content):
+                    # Keep metadata for NL recall; JSON retained for structure.
+                    full_content = f"{metadata_content}\n\n{content}" if metadata_content else content
                 else:
backend/apps/ai/management/commands/ai_update_repository_context.py (1)

18-28: Select related FKs to avoid N+1 during extraction

extract_repository_content accesses organization.login and owner.login. Prefetch them to reduce queries.

         return (
             super()
             .get_base_queryset()
+            .select_related("organization", "owner")
             .filter(
                 is_owasp_site_repository=True,
                 is_archived=False,
                 is_empty=False,
             )
         )
backend/apps/ai/management/commands/ai_update_repository_chunks.py (2)

18-28: Select related FKs to avoid N+1 during extraction

Same FK access pattern as the context command; apply here too.

         return (
             super()
             .get_base_queryset()
+            .select_related("organization", "owner")
             .filter(
                 is_owasp_site_repository=True,
                 is_archived=False,
                 is_empty=False,
             )
         )

34-36: Docstring nit: this is chunk creation, not context creation

Tiny wording fix for clarity.

-        """Return the source name for context creation."""
+        """Return the source name for chunk creation."""
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (1)

67-76: Use a stable patch target for the base queryset.

Patching via bases[0] is brittle. Patch the concrete base class method instead.

-        with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super:
+        from apps.ai.common.base.context_command import BaseContextCommand
+        with patch.object(BaseContextCommand, "get_base_queryset") as mock_super:

Replicate in similar tests below.

backend/apps/ai/common/extractors/repository.py (2)

116-121: Broaden markdown coverage to common docs used by OWASP repos.

Capturing CONTRIBUTING/CODE_OF_CONDUCT/SECURITY increases useful context for NestBot.

     markdown_files = [
         "README.md",
         "index.md",
         "info.md",
         "leaders.md",
+        "CONTRIBUTING.md",
+        "CODE_OF_CONDUCT.md",
+        "SECURITY.md",
     ]

13-22: Add parameter type hint (non-blocking).

Helps IDEs/static analyzers without importing Django models at runtime.

-from apps.ai.common.constants import DELIMITER
+from apps.ai.common.constants import DELIMITER
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from apps.github.models.repository import Repository
@@
-def extract_repository_content(repository) -> tuple[str, str]:
+def extract_repository_content(repository: "Repository") -> tuple[str, str]:
backend/tests/apps/ai/common/extractors/repository_test.py (3)

1-11: Patch out sleep to keep tests fast.

If the extractor still sleeps, make tests autouse-patch it.

+import pytest
@@
+@pytest.fixture(autouse=True)
+def _no_sleep(monkeypatch):
+    monkeypatch.setattr("apps.ai.common.extractors.repository.time.sleep", lambda *_: None)

498-501: Assert on called URL args, not stringified Call objects.

More precise and less brittle.

-        mock_get_content.assert_called()
-        assert any(
-            "https://raw.githubusercontent.com/test-org/test-repo/develop/" in str(call)
-            for call in mock_get_content.call_args_list
-        )
+        urls = [args[0] for args, _ in (c for c in mock_get_content.call_args_list)]
+        assert any(
+            url.startswith("https://raw.githubusercontent.com/test-org/test-repo/develop/")
+            for url in urls
+        )

358-399: If you add more markdown files in the extractor, extend this test accordingly.

Add assertions for CONTRIBUTING.md/CODE_OF_CONDUCT.md/SECURITY.md when you broaden the list.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 103fb24 and f0fc71a.

📒 Files selected for processing (9)
  • backend/apps/ai/common/base/chunk_command.py (2 hunks)
  • backend/apps/ai/common/extractors/repository.py (1 hunks)
  • backend/apps/ai/common/utils.py (2 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_context.py (1 hunks)
  • backend/apps/ai/models/chunk.py (3 hunks)
  • backend/tests/apps/ai/common/extractors/repository_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
🧰 Additional context used
🧬 Code graph analysis (7)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (1)
  • extract_repository_content (13-164)
backend/apps/ai/common/extractors/repository.py (1)
backend/apps/github/utils.py (1)
  • get_repository_file_content (60-79)
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
  • Command (10-36)
  • source_name (34-36)
  • extract_content (14-16)
  • get_base_queryset (18-28)
  • get_default_queryset (30-32)
backend/apps/ai/common/base/context_command.py (1)
  • BaseContextCommand (9-54)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-95)
backend/apps/ai/common/extractors/repository.py (1)
  • extract_repository_content (13-164)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
  • Command (10-36)
  • extract_content (14-16)
  • get_base_queryset (18-28)
  • get_default_queryset (30-32)
  • source_name (34-36)
backend/apps/ai/models/chunk.py (3)
backend/apps/ai/models/context.py (1)
  • Context (15-61)
backend/apps/common/models.py (2)
  • BulkSaveModel (10-34)
  • TimestampedModel (37-46)
backend/apps/common/utils.py (1)
  • truncate (164-176)
backend/apps/ai/common/base/chunk_command.py (3)
backend/apps/ai/common/utils.py (2)
  • create_chunks_and_embeddings (20-71)
  • is_json_content (74-88)
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (1)
  • extract_content (32-34)
backend/tests/apps/ai/common/base/chunk_command_test.py (1)
  • extract_content (23-25)
backend/apps/ai/management/commands/ai_update_repository_context.py (3)
backend/apps/ai/common/base/context_command.py (1)
  • BaseContextCommand (9-54)
backend/apps/ai/common/extractors/repository.py (1)
  • extract_repository_content (13-164)
backend/apps/github/models/repository.py (1)
  • Repository (25-349)
🔇 Additional comments (5)
backend/apps/ai/common/utils.py (1)

3-3: Import looks good

backend/apps/ai/common/base/chunk_command.py (1)

7-7: Import looks good

backend/apps/ai/management/commands/ai_update_repository_context.py (1)

10-17: Command structure and delegation look solid

backend/apps/ai/management/commands/ai_update_repository_chunks.py (1)

10-17: Command wiring LGTM

backend/apps/ai/models/chunk.py (1)

19-23: DB uniqueness added — migration exists; confirm it's applied

backend/apps/ai/migrations/0005_context_alter_chunk_unique_together_chunk_context_and_more.py contains AlterUniqueTogether setting unique_together={('context','text')}; initial migration 0001 used unique_together={('message','chunk_text')}. Ensure the 0005 migration is included in your release and applied to the target database.

@Dishant1804 Dishant1804 marked this pull request as ready for review September 13, 2025 19:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/apps/ai/common/extractors/repository.py (1)

123-144: Remove per‑file sleeps; pass a short timeout and log with context.

The hard delay slows batch runs/tests and doesn’t guarantee politeness. Rely on request timeouts and contextual logging; manage rate limiting at the caller or via a shared limiter.

@@
-    for file_path in markdown_files:
-        try:
+    for file_path in markdown_files:
+        try:
             if repository.organization:
                 owner = repository.organization.login
             else:
                 owner = repository.owner.login if repository.owner else ""
@@
-                content = get_repository_file_content(raw_url)
+                content = get_repository_file_content(raw_url, timeout=10)
 
                 if content and content.strip():
                     markdown_content[file_path] = content
-                time.sleep(GITHUB_REQUEST_INTERVAL_SECONDS)
 
-        except (ValueError, TypeError, OSError):
-            logger.debug("Failed to fetch markdown file")
+        except Exception as exc:
+            logger.debug(
+                "Failed to fetch %s for %s/%s (branch=%s): %s",
+                file_path,
+                owner or "(unknown)",
+                getattr(repository, "key", None) or "(no-key)",
+                getattr(repository, "default_branch", None) or "main",
+                exc,
+            )
             continue

If throttling is still desired, move it to the management command (single place), or apply only between successful network attempts with jitter.

🧹 Nitpick comments (2)
backend/apps/ai/common/extractors/repository.py (1)

116-121: Expand Markdown targets or discover dynamically.

Limiting to four filenames will miss common docs (CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, GOVERNANCE.md, README.* variants). Consider a discovery pass (list root .md files) or extend this list.

Example extension (quick win):

     markdown_files = [
         "README.md",
         "index.md",
         "info.md",
         "leaders.md",
+        "CONTRIBUTING.md",
+        "CODE_OF_CONDUCT.md",
+        "SECURITY.md",
+        "GOVERNANCE.md",
+        "README.MD",
+        "README.markdown",
     ]
backend/apps/ai/common/constants.py (1)

8-8: Clarify GitHub throttle vs global MIN; avoid conflicting semantics.

MIN_REQUEST_INTERVAL_SECONDS = 1.2 is used in backend/apps/ai/common/utils.py (OpenAI embeddings throttling). GITHUB_REQUEST_INTERVAL_SECONDS = 0.5 is used in backend/apps/ai/common/extractors/repository.py (time.sleep after get_repository_file_content). Either document why GitHub raw fetches use a lower delay, or rename to GITHUB_RAW_FETCH_DELAY_SECONDS and add a comment, and/or align the values if 0.5 is unintended.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0fc71a and 9766965.

📒 Files selected for processing (3)
  • backend/apps/ai/common/constants.py (1 hunks)
  • backend/apps/ai/common/extractors/repository.py (1 hunks)
  • backend/apps/ai/common/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/apps/ai/common/utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/apps/ai/common/extractors/repository.py (2)
backend/apps/github/utils.py (1)
  • get_repository_file_content (60-79)
backend/apps/github/api/internal/queries/repository.py (1)
  • repository (14-35)
🔇 Additional comments (2)
backend/apps/ai/common/extractors/repository.py (2)

33-34: No change required — repository.license is already a string.

backend/apps/github/models/repository.py declares license = models.CharField(...) (line 44) and sets self.license = gh_repository.license.spdx_id on import (line 272); tests and other code treat it as a string. The extractor assignment is safe.

Likely an incorrect or invalid review comment.


97-101: No timezone fix required — repository datetimes are UTC-aware; keep .strftime('%Y-%m-%d').

Models declare created_at/updated_at/pushed_at as DateTimeField and tests/migrations supply tz-aware datetimes (tzinfo=UTC / datetime.UTC), so the current formatting is safe. See backend/apps/github/models/repository.py and backend/tests/apps/ai/common/extractors/repository_test.py.

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still too far from what can be accepted for merge even into a feature branch:

if ownership:
repository_data["ownership"] = ownership

if repository.track_issues:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal field. Why including this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it

if repository.track_issues:
repository_data["track_issues"] = repository.track_issues

markdown_files = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned we also need tab_ prefixed ones for all entities, see

return chunks


def is_json_content(content: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wider level common utils function, not just ai app. Also rename it to something like is_valid_json, not sure if there already exists ready for reuse code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was not there -- for future usage I have added it to the common/utils.py

Comment on lines +36 to +37
unique_chunks = []
unique_chunks = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?


if unique_chunks:
try:
BulkSaveModel.bulk_save(Chunk, unique_chunks, fields=fields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous review rounds we had a long discussion on how we use bulk_save and update_data methods within the code base. And now you just changing it based on your own vision again?

This is not a bulk save when you save each item separately. Why all these exception handling here -- I suggest starting from addressing this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we had a discussion and based on that I reverted the changes -- will keep this in mind

pypoetry
pyyaml
repositorycontributor
repositorys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dishant1804 this is legit suggestion, it's grammatically incorrect and needs to be fixed.

@arkid15r arkid15r marked this pull request as draft September 14, 2025 17:02
@Dishant1804
Copy link
Contributor Author

for the plural part I am thinking to add a method that handles the words ending with 'y to 'ies'
and If name ends with 's', 'x', 'z', 'ch', or 'sh' add 'es'

@arkid15r i think this is a better approach, let me know if we want to do it -- I checked the utils and we dont have method that pluralizes the word

@Dishant1804
Copy link
Contributor Author

Quality Gate Failed Quality Gate failed

Failed conditions 5.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

we can ignore sonar qube in this case because we are following the same practice as other command files

@Dishant1804 Dishant1804 marked this pull request as ready for review September 16, 2025 13:20
Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @github-advanced-security's suggestions

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
backend/apps/ai/common/extractors/repository.py (3)

3-7: Remove hard sleep; rely on request timeouts and logging.

Avoid slowing batch runs/tests with per-file sleeps. You already have a timeout-enabled fetch.

-import time

145-152: Use per-request timeout and drop sleep.

Pass an explicit short timeout; remove the sleep.

-                content = get_repository_file_content(raw_url)
+                content = get_repository_file_content(raw_url, timeout=10)
@@
-                if content and content.strip():
+                if content and content.strip():
                     markdown_content[file_path] = content
-                time.sleep(GITHUB_REQUEST_INTERVAL_SECONDS)

154-156: Broaden exception handling and log rich context.

Network/HTTP libraries can raise diverse exceptions. Log with repo/owner/branch/file for triage.

-        except (ValueError, TypeError, OSError):
-            logger.debug("Failed to fetch markdown file")
+        except Exception as exc:
+            logger.debug(
+                "Failed to fetch %s for %s/%s (branch %s): %s",
+                file_path,
+                owner or "(unknown owner)",
+                repository.key or "(no key)",
+                branch,
+                exc,
+            )
             continue
🧹 Nitpick comments (6)
backend/apps/common/utils.py (1)

106-121: Clarify JSON validity semantics (primitives also pass).

As written, "42", true, or "foo" will return True. If that’s intended, please document it; if not, gate on objects/arrays.

Apply one of the following:

Option A — clarify docstring:

 def is_valid_json(content: str) -> bool:
-    """Check if content is JSON format.
+    """Check if content is JSON format.
+
+    Note: primitive JSON values (numbers, strings, booleans, null) also return True.
     ...

Option B — restrict to objects/arrays:

 def is_valid_json(content: str) -> bool:
@@
-    try:
-        json.loads(content)
+    try:
+        obj = json.loads(content)
     except (TypeError, ValueError):
         return False
-    return True
+    return isinstance(obj, (dict, list))
backend/apps/ai/common/extractors/repository.py (2)

76-93: Consider preserving zero-valued stats.

Current truthiness checks drop zeros (e.g., forks=0). If consumers compare repos, missing keys vs zero may be ambiguous. Prefer “is not None”.

Example:

-    if repository.forks_count:
+    if repository.forks_count is not None:
         stats["forks"] = repository.forks_count

Apply similarly to other counts.


34-37: License/topic types may be non-serializable — coerce or sanitize.

If license/topics aren’t plain strings/lists (e.g., objects), json.dumps will fail. Either coerce to strings or filter serializable types.

backend/tests/apps/ai/common/extractors/repository_test.py (3)

420-425: Fix CodeQL “incomplete URL substring sanitization” warning in tests.

Use a strict prefix check to avoid substring-based URL logic.

-        def side_effect(url):
-            if "raw.githubusercontent.com" in url:
+        def side_effect(url):
+            if str(url).startswith("https://raw.githubusercontent.com/"):
                 msg = "Network error"
                 raise ValueError(msg)
             return "[]"

573-579: Fix second instance of substring URL check.

Same rationale as above.

-        def side_effect(url):
-            if "raw.githubusercontent.com" in url:
+        def side_effect(url):
+            if str(url).startswith("https://raw.githubusercontent.com/"):
                 msg = "Test exception"
                 raise ValueError(msg)
             return "[]"

261-314: Add coverage for tab_*.md discovery.

No test verifies tab file enumeration from the contents API. Add one where contents API returns a JSON list including tab_example.md and assert it’s fetched.

I can draft the test if you confirm the expected behavior for nested vs root-level tab files.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a5fceb2 and 5499d75.

📒 Files selected for processing (14)
  • backend/apps/ai/Makefile (1 hunks)
  • backend/apps/ai/common/base/chunk_command.py (2 hunks)
  • backend/apps/ai/common/constants.py (1 hunks)
  • backend/apps/ai/common/extractors/repository.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_repository_context.py (1 hunks)
  • backend/apps/ai/models/chunk.py (1 hunks)
  • backend/apps/common/utils.py (2 hunks)
  • backend/test_commands.py (1 hunks)
  • backend/tests/apps/ai/common/extractors/repository_test.py (1 hunks)
  • backend/tests/apps/ai/common/utils_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (1 hunks)
  • backend/tests/apps/ai/models/chunk_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
  • backend/apps/ai/models/chunk.py
  • backend/apps/ai/common/constants.py
  • backend/apps/ai/management/commands/ai_update_repository_chunks.py
  • backend/apps/ai/management/commands/ai_update_repository_context.py
  • backend/tests/apps/ai/models/chunk_test.py
  • backend/apps/ai/Makefile
  • backend/tests/apps/ai/common/utils_test.py
  • backend/test_commands.py
  • backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/apps/ai/common/extractors/repository.py (2)
backend/apps/common/utils.py (1)
  • is_valid_json (106-120)
backend/apps/github/utils.py (1)
  • get_repository_file_content (60-79)
backend/apps/ai/common/base/chunk_command.py (3)
backend/apps/common/utils.py (1)
  • is_valid_json (106-120)
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (1)
  • extract_content (32-34)
backend/tests/apps/ai/common/base/chunk_command_test.py (1)
  • extract_content (23-25)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (1)
  • extract_repository_content (14-176)
🪛 GitHub Check: CodeQL
backend/tests/apps/ai/common/extractors/repository_test.py

[failure] 421-421: Incomplete URL substring sanitization
The string raw.githubusercontent.com may be at an arbitrary position in the sanitized URL.


[failure] 574-574: Incomplete URL substring sanitization
The string raw.githubusercontent.com may be at an arbitrary position in the sanitized URL.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run CI Denendencies Scan
  • GitHub Check: Run Code Scan
  • GitHub Check: CodeQL (python)
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (1)
backend/apps/ai/common/base/chunk_command.py (1)

47-55: JSON path drops metadata_content — confirm intent.

When content is JSON, metadata_content is ignored. If downstream retrieval relies on that context, consider storing it in chunk metadata (not the chunk text) or prefixing the first chunk out-of-band. If this is intentional, no change needed.

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions regarding tests:

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
5.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@arkid15r arkid15r enabled auto-merge (squash) September 18, 2025 19:29
@arkid15r arkid15r merged commit ef97206 into OWASP:feature/nestbot-assistant Sep 18, 2025
24 of 25 checks passed
arkid15r added a commit that referenced this pull request Sep 19, 2025
* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
@coderabbitai coderabbitai bot mentioned this pull request Sep 20, 2025
5 tasks
arkid15r added a commit that referenced this pull request Sep 23, 2025
* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
arkid15r added a commit that referenced this pull request Sep 28, 2025
* Sync www-repopsitories (#2164)

* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Consolidate code commits

* Update cspell/custom-dict.txt

* Update docker-compose/local.yaml

* local yaml worder volume fix

* instance check

* poetry file updated

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
arkid15r added a commit that referenced this pull request Sep 28, 2025
* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
arkid15r added a commit that referenced this pull request Sep 28, 2025
* Sync www-repopsitories (#2164)

* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Consolidate code commits

* Update cspell/custom-dict.txt

* Update docker-compose/local.yaml

* local yaml worder volume fix

* instance check

* poetry file updated

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
@coderabbitai coderabbitai bot mentioned this pull request Oct 17, 2025
3 tasks
arkid15r added a commit that referenced this pull request Nov 16, 2025
* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
arkid15r added a commit that referenced this pull request Nov 16, 2025
* Sync www-repopsitories (#2164)

* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Consolidate code commits

* Update cspell/custom-dict.txt

* Update docker-compose/local.yaml

* local yaml worder volume fix

* instance check

* poetry file updated

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
mirnumaan pushed a commit to mirnumaan/Nest that referenced this pull request Nov 16, 2025
* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
mirnumaan pushed a commit to mirnumaan/Nest that referenced this pull request Nov 16, 2025
* Sync www-repopsitories (OWASP#2164)

* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Consolidate code commits

* Update cspell/custom-dict.txt

* Update docker-compose/local.yaml

* local yaml worder volume fix

* instance check

* poetry file updated

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
github-merge-queue bot pushed a commit that referenced this pull request Nov 19, 2025
* add the aria-label

* fixed the code style, fixed by the formatter

* Handled the undefined repositoryName in aria-label.

* fix the white spaces

* Add aria-labels to interactive elements for WCAG 2.1 compliance

* Sync www-repopsitories (#2164)

* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Update docker-compose/local.yaml

* Nestbot MVP (#2113)

* Sync www-repopsitories (#2164)

* spelling fixes and tests

* sonar and code rabbit suggestions implemented

* json chunking and suggestions implemented

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* suggestions implemented

* github advance security addressed

* tests fixed

* fixed tests

* Clean up backend/test_commands.py

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Consolidate code commits

* Update cspell/custom-dict.txt

* Update docker-compose/local.yaml

* local yaml worder volume fix

* instance check

* poetry file updated

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* fixed the duplication error for chunks and context (#2343)

* Fix slack and duplication errors (#2352)

* fix slack and duplication errors

* code rabbit suggestions

* integrity error solved

* using set

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Response improvements and refactoring (#2407)

* improvements and refactoring

* added prompt checks and tests for it

* question detection refining (#2443)

* question detection refining

* sonar qube fixes

* fix tests

* Agentic rag (#2432)

* agentic rag

* spelling fixes

* code rabbit and sonar qube suggestions

* code rabbit suggestions

* refining

* fix test

* refining

* added question detectoor to nestbot mentions (#2473)

* Merge NestBot AI Assistant feature branch

* Update docker-compose/local.yaml

* Update backend/Makefile

* fix the white spaces

* Add aria-labels to interactive elements for WCAG 2.1 compliance

* Fix equality checks with floating point values (#2658)

* Edit in test cases for the new aira-label update

* More edits to make the test more unbreakable and few regex edits .

* Revert one change regards two seperate Contributors

* Regex check fix

* Global 10-second timeout and increased other mobile timeout to fix the  Mobile Safari flaky tests

---------

Co-authored-by: Dishant Miyani <dishantmiyani1804@gmail.com>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Kate Golovanova <kate@kgthreads.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fetch .md content from OWASP www- repositories to improve NestBot contexts

2 participants