Sync www-repopsitories#2164
Conversation
Summary by CodeRabbit
WalkthroughAdds repository content extraction and two Django management commands (chunks & context) with Makefile targets; increases chunk size/overlap; makes chunking JSON-aware and skips empty content; adds GitHub request pacing and JSON validation util; includes tests and a placeholder test module. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
docker-compose/local.yaml (1)
25-25: Bug: backend service sets DJANGO_REDIS_PASSWORD from DJANGO_REDIS_HOST.This will set the password to the host value when DJANGO_REDIS_HOST is defined.
- DJANGO_REDIS_PASSWORD: ${DJANGO_REDIS_HOST:-nest-cache-password} + DJANGO_REDIS_PASSWORD: ${DJANGO_REDIS_PASSWORD:-nest-cache-password}backend/apps/ai/models/chunk.py (1)
85-93: Handle race with unique constraint in update_data().Concurrent writers can still insert the same (context, text) between the exists() check and save(), raising IntegrityError. Catch and return None to match the method contract.
- if save: - chunk.save() + if save: + from django.db import IntegrityError + try: + chunk.save() + except IntegrityError: + return Nonebackend/apps/slack/events/__init__.py (1)
1-20: Make event configuration idempotent and skip heavy imports when Slack app is disabledPrevents duplicate registrations and avoids importing event modules if tokens are missing.
+_EVENTS_CONFIGURED = False + def configure_slack_events(): """Configure Slack events after Django apps are ready.""" - from apps.slack.apps import SlackConfig - from apps.slack.events import ( - app_home_opened, - message_posted, - team_join, - url_verification, - ) - from apps.slack.events.event import EventBase - from apps.slack.events.member_joined_channel import ( - catch_all, - contribute, - gsoc, - project_nest, - ) - - if SlackConfig.app: - EventBase.configure_events() + from apps.slack.apps import SlackConfig + global _EVENTS_CONFIGURED + if not SlackConfig.app or _EVENTS_CONFIGURED: + return + from apps.slack.events import ( + app_home_opened, + message_posted, + team_join, + url_verification, + ) + from apps.slack.events.event import EventBase + from apps.slack.events.member_joined_channel import ( + catch_all, + contribute, + gsoc, + project_nest, + ) + EventBase.configure_events() + _EVENTS_CONFIGURED = Truebackend/tests/apps/ai/common/extractors/repository_test.py (1)
434-455: Log message lacks repository context expected by testsTests assert the debug log contains
test-repo. Update extractor logging to include repo key/owner/path as shown above.
🧹 Nitpick comments (48)
backend/test_commands.py (1)
1-1: Avoid stray test_ module outside tests/*Pytest will collect this empty module, adding noise. Either remove it or rename/move under a non-collected path (e.g.,
backend/commands_testutils.py) or intotests/with actual tests.backend/settings/base.py (1)
145-152: Harden RQ config: prefer URL form and make timeout configurable.Using a single Redis URL avoids partial env mismatches and mirrors your cache config; exposing timeout via env helps ops tuning.
Apply within this block:
- "default": { - "HOST": REDIS_HOST, - "PORT": 6379, - "PASSWORD": REDIS_PASSWORD, - "DEFAULT_TIMEOUT": 360, - } + "default": { + # Use URL form for parity with CACHES and simpler overrides. + "URL": f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:6379/0", + # Make tunable via env; 360s default preserved. + "DEFAULT_TIMEOUT": values.IntegerValue( + environ_name="RQ_DEFAULT_TIMEOUT", + default=360, + ), + }Optionally consider adding:
- RQ exception handlers that report to Sentry.
- Separate queues (e.g., "ai", "slack") with different timeouts.
backend/tests/apps/ai/common/utils_test.py (1)
108-116: Add a minimal assertion to lock invocation count.Verifies we don’t loop on failures.
result = create_chunks_and_embeddings( chunk_texts=["some text"], context=None, openai_client=mock_openai_client, ) mock_logger.exception.assert_called_once_with("Failed to create chunks and embeddings") + mock_update_data.assert_called_once() assert result == []backend/apps/slack/constants.py (1)
25-80: Use frozenset for OWASP_KEYWORDS and prune overly generic entries
Convert OWASP_KEYWORDS to a frozenset to prevent accidental mutation; remove broad terms (e.g. “security”, “project”, “event”) to cut down on false positives—matching is already case-insensitive via text.lower(). Evaluate the false-positive rate on a representative Slack log before enabling auto-replies.backend/apps/ai/models/chunk.py (1)
5-5: Also import IntegrityError.If you adopt the changes above, add IntegrityError to imports.
-from django.db import DatabaseError, models +from django.db import DatabaseError, IntegrityError, modelsbackend/tests/apps/ai/models/chunk_test.py (4)
31-42: Assert list-clearing contract after bulk_save.If bulk_save is expected to clear the input list (as BulkSaveModel does), assert it to prevent regressions.
with patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk_save: Chunk.bulk_save(mock_chunks) mock_bulk_save.assert_called_once_with(Chunk, mock_chunks, fields=None) + assert mock_chunks == []
43-55: Mirror the clearing assertion for the fields path.Keep expectations consistent across both code paths.
with patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk_save: Chunk.bulk_save(mock_chunks, fields=fields) mock_bulk_save.assert_called_once_with(Chunk, mock_chunks, fields=fields) + assert mock_chunks == []
65-86: Remove redundant @patch on Chunk.save.You replace Chunk with a Mock later; the decorator patch becomes moot and can confuse readers.
-@patch("apps.ai.models.chunk.Chunk.save") -def test_update_data_save_with_context(self, mock_save): +def test_update_data_save_with_context(self):
31-55: Add a test ensuring updates aren’t filtered out.Guard against regressions where chunks with ids (updates) get dropped by existence filtering.
Proposed new test:
def test_bulk_save_includes_updates(): c1 = Mock(spec=Chunk); c1.id = 42; c1.context = Mock(); c1.text = "t1" c2 = Mock(spec=Chunk); c2.id = None; c2.context = Mock(); c2.text = "t2" with patch("apps.ai.models.chunk.Chunk.objects.filter") as mock_filter, \ patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk: mock_filter.return_value.exists.return_value = False chunks = [c1, c2] Chunk.bulk_save(chunks) # c1 (update) must still be passed through mock_bulk.assert_called_once() passed = mock_bulk.call_args.args[1] assert c1 in passed and c2 in passedWant me to push this into the suite?
backend/apps/ai/common/extractors/repository.py (5)
130-146: Broaden markdown coverage (common variants).Consider additional common filenames/paths to increase hit rate on OWASP www repos (case and format variants).
markdown_files = [ "README.md", + "README.rst", + "README.adoc", + "readme.md", + "README", "CONTRIBUTING.md", "CODE_OF_CONDUCT.md", "SECURITY.md", "CHANGELOG.md", - "LICENSE.md", + "LICENSE.md", + "LICENSE", + "docs/README.md", "docs/README.md", "docs/index.md", "docs/overview.md", "docs/getting-started.md", "docs/installation.md", "docs/usage.md", "docs/api.md", "docs/development.md", "docs/deployment.md", + "index.md", + "content/index.md", + "content/README.md", + "site/README.md", ]
156-160: Branch fallback for site repos.If default_branch is missing, try gh-pages then master before main to better handle www repos.
- raw_url = ( - f"https://raw.githubusercontent.com/{owner}/{repository.key}/" - f"{repository.default_branch or 'main'}/{file_path}" - ) + branch = ( + repository.default_branch + or ("gh-pages" if getattr(repository, "has_pages", False) else None) + or "master" + or "main" + ) + raw_url = f"https://raw.githubusercontent.com/{owner}/{repository.key}/{branch}/{file_path}"
167-169: Improve logging with context.Log which file/owner/repo failed and keep traceback; current message loses diagnostics.
- except (ValueError, TypeError, OSError, ConnectionError): - logger.debug("Failed to fetch") + except (ValueError, TypeError, OSError, ConnectionError): + logger.debug( + "Failed to fetch markdown file", + extra={"file_path": file_path, "owner": owner, "repo": getattr(repository, "key", None)}, + exc_info=True, + ) continue
148-166: Surface fetched files in metadata (and drop unused variable otherwise).Either remove fetched_files or expose it so downstream context can reference what was pulled.
- fetched_files = [] + fetched_files = [] @@ - if content and content.strip(): - fetched_files.append(file_path) + if content and content.strip(): + fetched_files.append(file_path) prose_parts.append(f"## {file_path}\n\n{content}") @@ return ( - DELIMITER.join(filter(None, prose_parts)), - DELIMITER.join(filter(None, metadata_parts)), + DELIMITER.join(filter(None, prose_parts)), + DELIMITER.join( + filter( + None, + metadata_parts + + ( + [f"Fetched Markdown Files: {', '.join(fetched_files)}"] + if fetched_files + else [] + ), + ) + ), )Also applies to: 171-174
148-166: Consider rate limits and 404 noise from raw URLs.Fetching many files per repo can hit GitHub rate limits and raw 404s may return HTML. Prefer GitHub API for contents (with If-None-Match) or add status checks in get_repository_file_content to return "" on non-200.
I can provide an API-backed extractor variant and harden get_repository_file_content to ignore non-200 responses. Want a follow-up PR?
backend/apps/slack/admin/conversation.py (1)
30-31: Expose the new flag in filters for quicker triageAdding the boolean to list_filter eases admin workflows without altering data model.
Apply outside this hunk:
@@ list_filter = ( "sync_messages", "is_archived", "is_channel", "is_general", "is_im", "is_private", + "is_nest_bot_assistant_enabled", )backend/apps/slack/commands/ai.py (2)
21-23: Avoid KeyError when Slack omits text; use a safe default.
Use.get("text") or ""to guard against missing/None and prevent an AttributeError on.strip().- return get_blocks( - query=command["text"].strip(), - ) + return get_blocks( + query=(command.get("text") or "").strip(), + )
19-20: Document the local import intent.
Add a short comment so future readers know this is deliberate for lazy-loading/cycle avoidance.- from apps.slack.common.handlers.ai import get_blocks + # Local import to avoid circular import and reduce import-time overhead. + from apps.slack.common.handlers.ai import get_blocksbackend/apps/ai/Makefile (1)
37-44: Declare PHONY targets for the new commands.
Prevents make from treating similarly named files as up-to-date targets.ai-update-repository-context: @echo "Updating repository context" @CMD="python manage.py ai_update_repository_context" $(MAKE) exec-backend-command +.PHONY: ai-update-repository-chunks ai-update-repository-contextbackend/apps/ai/management/commands/ai_update_repository_context.py (1)
18-28: Eager-load FKs used by the extractor.
extract_repository_markdown_contentaccessesorganization.loginandowner.login; addselect_relatedto avoid N+1 queries during batch runs.return ( super() .get_base_queryset() .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) - ) + ).select_related("organization", "owner")backend/apps/slack/services/message_auto_reply.py (1)
33-35: Race window note (optional).
Small race exists between the replies check and posting; acceptable, but consider an idempotency key or re-check on 409s if this becomes noisy.backend/apps/slack/common/handlers/ai.py (4)
16-19: Fix docstring: remove undocumented parameter.
Thepresentationargument isn’t in the signature.- presentation (EntityPresentation | None): Configuration for entity presentation.
26-28: Guard against oversized Slack block text.
Slack mrkdwn text is limited (~3000 chars per section). Chunk long responses into multiple blocks.- if ai_response: - return [markdown(ai_response)] + if ai_response: + max_len = 2900 + if len(ai_response) <= max_len: + return [markdown(ai_response)] + return [markdown(ai_response[i : i + max_len]) for i in range(0, len(ai_response), max_len)]
31-46: Make models configurable via settings.
Avoid hard-coding model IDs; read from Django settings with sane defaults.+ from django.conf import settings rag_tool = RagTool( - chat_model="gpt-4o", - embedding_model="text-embedding-3-small", + chat_model=getattr(settings, "AI_CHAT_MODEL", "gpt-4o"), + embedding_model=getattr(settings, "AI_EMBEDDING_MODEL", "text-embedding-3-small"), )
10-10: Unused logger.
Either useloggerfor diagnostics or remove it.backend/apps/slack/MANIFEST.yaml (2)
111-124: Tighten bot scopes (least privilege).Line 123 adds channels:history, which is needed for message.channels. Line 122 (channels:manage) appears unnecessary for this feature set—drop it if unused.
- users:read - - groups:write - - channels:manage + - groups:write + # Removed channels:manage (not required for read/AI reply workflows) - channels:history
131-134: Validate event coverage vs. intended surfaces.message.channels covers public channels only. If replies are desired in private channels/DMs, add message.groups and message.im and corresponding groups:history/im:history scopes; otherwise consider app_mention to reduce noise.
backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py (1)
11-17: Field OK; consider index and default rollout.
- If this flag is frequently filtered, add db_index=True or a model Index.
- Default False will disable the assistant for all existing conversations. Confirm that’s intended or add a data migration to enable for selected workspaces/channels.
- field=models.BooleanField(default=False, verbose_name="Is Nest Bot Assistant Enabled"), + field=models.BooleanField( + default=False, + verbose_name="Is Nest Bot Assistant Enabled", + db_index=True, + ),backend/apps/ai/management/commands/ai_update_repository_chunks.py (1)
18-28: Avoid N+1 on owner/org during extraction.The extractor touches repository.organization/owner; select_related will prevent per-row FK hits.
def get_base_queryset(self) -> QuerySet: """Return the base queryset with filtering for OWASP site repositories.""" return ( super() .get_base_queryset() + .select_related("organization", "owner") .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) )backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (3)
45-47: Fix pluralization to “repositories”.Current expectation hardcodes a misspelling (“repositorys”) and bakes it into UX.
- assert command.entity_name_plural == "repositorys" + assert command.entity_name_plural == "repositories"Outside this file, consider deriving from model meta to avoid manual pluralization:
# In BaseAICommand (or similar) @property def entity_name_plural(self) -> str: return getattr(self.model_class._meta, "verbose_name_plural", f"{self.entity_name}s")
67-82: Patch target is brittle.Patching command.class.bases[0].get_base_queryset relies on MRO order. Patch the concrete base symbol instead.
- with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super: + from apps.ai.common.base.chunk_command import BaseChunkCommand + with patch.object(BaseChunkCommand, "get_base_queryset") as mock_super:
104-119: Remove duplicated test logic.This repeats test_get_base_queryset. Drop to reduce noise.
- def test_queryset_filtering_logic(self, command): - """Test that the queryset filtering logic works correctly.""" - with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super: - mock_queryset = Mock() - mock_super.return_value = mock_queryset - mock_queryset.filter.return_value = "filtered_queryset" - - result = command.get_base_queryset() - - mock_queryset.filter.assert_called_once_with( - is_owasp_site_repository=True, - is_archived=False, - is_empty=False, - ) - assert result == "filtered_queryset" + # Removed: duplicate of test_get_base_querysetbackend/tests/apps/slack/common/handlers/ai_test.py (2)
75-83: Avoid hard-coding model IDs; read from settings.Make chat/embedding model names configurable; update test to assert against settings-derived values.
- mock_rag_tool.assert_called_once_with( - chat_model="gpt-4o", - embedding_model="text-embedding-3-small", - ) + from django.conf import settings + mock_rag_tool.assert_called_once_with( + chat_model=getattr(settings, "SLACK_AI_CHAT_MODEL", "gpt-4o"), + embedding_model=getattr(settings, "SLACK_AI_EMBEDDING_MODEL", "text-embedding-3-small"), + )Outside this file (handler):
from django.conf import settings rag_tool = RagTool( chat_model=getattr(settings, "SLACK_AI_CHAT_MODEL", "gpt-4o"), embedding_model=getattr(settings, "SLACK_AI_EMBEDDING_MODEL", "text-embedding-3-small"), )
132-143: Add a guard for blank-only queries.Avoid calling the RAG pipeline when the stripped query is empty; return error blocks directly and add a test.
+ def test_get_blocks_with_blank_query(self): + with patch("apps.slack.common.handlers.ai.get_error_blocks") as mock_err: + mock_err.return_value = [{"type": "section"}] + with patch("apps.slack.common.handlers.ai.process_ai_query") as mock_proc: + result = get_blocks(" ") + mock_proc.assert_not_called() + assert result == [{"type": "section"}]backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (2)
67-83: Replace brittle base-class patching with import-path patching.Using bases[0] is fragile. Patch the known symbol instead.
- with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super: + with patch("apps.ai.common.base.context_command.BaseContextCommand.get_base_queryset") as mock_super: mock_queryset = Mock() mock_super.return_value = mock_queryset mock_queryset.filter.return_value = "filtered_queryset"Repeat the same replacement in the other tests that patch the base queryset.
Also applies to: 104-119, 127-141, 142-156, 157-171
127-171: DRY up the three queryset-filter tests.They assert the same filter arguments. Parametrize into one test to reduce noise and speed up the suite.
If helpful, I can provide a parametrized version.
backend/tests/apps/slack/services/message_auto_reply_test.py (3)
48-96: Also assert the retrieval call contract.Validate we call Message.objects.get with pk for added confidence.
generate_ai_reply_if_unanswered(mock_message.id) + mock_message_get.assert_called_once_with(pk=mock_message.id) mock_client.conversations_replies.assert_called_once_with( channel=mock_message.conversation.slack_channel_id, ts=mock_message.slack_message_id, limit=1, )
97-106: Add assertion for not-found path.Confirms correct lookup argument.
generate_ai_reply_if_unanswered(99999) + mock_message_get.assert_called_once_with(pk=99999)
172-211: Merge duplicate “no/empty AI response” tests via parametrization.Reduces repetition while preserving coverage.
- @patch("apps.slack.services.message_auto_reply.Message.objects.get") - @patch("apps.slack.services.message_auto_reply.WebClient") - @patch("apps.slack.services.message_auto_reply.process_ai_query") - def test_generate_ai_reply_no_ai_response( - self, - mock_process_ai_query, - mock_webclient, - mock_message_get, - mock_message, - ): - """Test when AI doesn't return a response.""" - mock_message_get.return_value = mock_message - mock_client = Mock() - mock_webclient.return_value = mock_client - mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]} - mock_process_ai_query.return_value = None - - generate_ai_reply_if_unanswered(mock_message.id) - - mock_client.chat_postMessage.assert_not_called() - - @patch("apps.slack.services.message_auto_reply.Message.objects.get") - @patch("apps.slack.services.message_auto_reply.WebClient") - @patch("apps.slack.services.message_auto_reply.process_ai_query") - def test_generate_ai_reply_empty_response( - self, - mock_process_ai_query, - mock_webclient, - mock_message_get, - mock_message, - ): - """Test when AI returns empty response.""" - mock_message_get.return_value = mock_message - mock_client = Mock() - mock_webclient.return_value = mock_client - mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]} - mock_process_ai_query.return_value = "" - - generate_ai_reply_if_unanswered(mock_message.id) - - mock_client.chat_postMessage.assert_not_called() + @pytest.mark.parametrize("ai_response", [None, ""]) + @patch("apps.slack.services.message_auto_reply.Message.objects.get") + @patch("apps.slack.services.message_auto_reply.WebClient") + @patch("apps.slack.services.message_auto_reply.process_ai_query") + def test_generate_ai_reply_no_or_empty_ai_response( + self, mock_process_ai_query, mock_webclient, mock_message_get, mock_message, ai_response + ): + """Test when AI returns no/empty response.""" + mock_message_get.return_value = mock_message + mock_client = Mock() + mock_webclient.return_value = mock_client + mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]} + mock_process_ai_query.return_value = ai_response + + generate_ai_reply_if_unanswered(mock_message.id) + + mock_client.chat_postMessage.assert_not_called()backend/tests/apps/slack/common/question_detector_test.py (3)
18-23: Avoid brittle assertions on pattern countsAsserting exact counts for
question_patterns/compiled_patternscouples tests to implementation details.- assert len(detector.question_patterns) == 4 - assert len(detector.compiled_patterns) == 4 + assert len(detector.question_patterns) >= 4 + assert len(detector.compiled_patterns) >= 4
49-64: Confirm intent: treating “advice/recommend” statements as questionsTests expect declaratives like “I recommend this approach” to be flagged as questions. If that’s not intended, constrain these patterns (e.g., require a question mark or leading question terms).
144-156: Reduce reliance on pattern orderIndex-based checks (
compiled_patterns[0/1]) are fragile. Prefer matching by pattern content or behavior.- question_mark_pattern = detector.compiled_patterns[0] - assert question_mark_pattern.search("Is this right?") is not None + assert any(p.pattern == r"\?" and p.search("Is this right?") for p in detector.compiled_patterns) - question_word_pattern = detector.compiled_patterns[1] - assert question_word_pattern.search("What is this") is not None - assert question_word_pattern.search("How does it work") is not None + expected = r"^(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)" + assert any(p.pattern == expected and p.search("What is this") for p in detector.compiled_patterns) + assert any(p.pattern == expected and p.search("How does it work") for p in detector.compiled_patterns)backend/apps/slack/common/question_detector.py (3)
16-19: Normalize keywords to lowercase onceEnsure case-insensitive matching regardless of how
OWASP_KEYWORDSis defined.- self.owasp_keywords = OWASP_KEYWORDS + self.owasp_keywords = {k.lower() for k in OWASP_KEYWORDS}
20-25: Tighten regex to reduce false positivesAdd word-boundaries and allow optional leading whitespace for question starters; keeps behavior but trims accidental matches like “whatsoever”.
- r"^(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)", - r"(help|explain|tell me|show me|guide|tutorial|example)", - r"(recommend|suggest|advice|opinion)", + r"^\s*\b(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)\b", + r"\b(help|explain|tell me|show me|guide|tutorial|example)\b", + r"\b(recommend|suggest|advice|opinion)\b",
48-57: Boundary-aware phrase matching for multi-word keywordsSubstring matching can overmatch. Consider boundary-aware search that ignores punctuation/hyphens.
# Example enhancement: def contains_owasp_keywords(self, text: str) -> bool: words = re.findall(r"\b\w+\b", text) text_words = set(words) if self.owasp_keywords.intersection(text_words): return True # Multi-word phrases: collapse whitespace/punctuation and match by tokens normalized = re.sub(r"[\s\-_]+", " ", text) for keyword in self.owasp_keywords: if " " in keyword and re.search(rf"\b{re.escape(keyword)}\b", normalized): return True return Falsebackend/tests/apps/slack/commands/ai_test.py (1)
13-17: Remove unused autouse fixture or reuse itYou recreate
Ai()in tests; either reuseself.ai_commandor drop the autouse fixture.- @pytest.fixture(autouse=True) - def setup_method(self): - """Set up test data before each test method.""" - self.ai_command = Ai() + @pytest.fixture + def ai_command(self): + """Ai instance per test.""" + return Ai()Then use
ai_commandin tests.backend/tests/apps/ai/common/extractors/repository_test.py (1)
397-405: Use call.args for robust URL assertionsStringifying
callis brittle. Inspect the first positional arg.- assert any( - "https://raw.githubusercontent.com/test-org/test-repo/develop/" in str(call) - for call in mock_get_content.call_args_list - ) + assert any( + "https://raw.githubusercontent.com/test-org/test-repo/develop/" in c.args[0] + for c in mock_get_content.call_args_list + )backend/tests/apps/slack/events/message_posted_test.py (2)
95-123: Add a case for mid-message mentions (optional)If mentions anywhere should trigger AI reply, add a test where the mention is not at the start (e.g., “Hi <@u987654> what is OWASP?”) and confirm the stripped query is correct. If only leading mentions should trigger, assert non-trigger instead.
95-123: Consider guarding against None blocks from get_blocksIf
get_blockscan return None,chat_postMessagemay fail. Either assert in tests that we skip posting when None, or handle it in the event handler.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (36)
backend/apps/ai/Makefile(1 hunks)backend/apps/ai/common/extractors/repository.py(1 hunks)backend/apps/ai/management/commands/ai_update_repository_chunks.py(1 hunks)backend/apps/ai/management/commands/ai_update_repository_context.py(1 hunks)backend/apps/ai/models/chunk.py(3 hunks)backend/apps/slack/MANIFEST.yaml(3 hunks)backend/apps/slack/admin/conversation.py(1 hunks)backend/apps/slack/apps.py(1 hunks)backend/apps/slack/commands/__init__.py(1 hunks)backend/apps/slack/commands/ai.py(1 hunks)backend/apps/slack/common/handlers/ai.py(1 hunks)backend/apps/slack/common/question_detector.py(1 hunks)backend/apps/slack/constants.py(1 hunks)backend/apps/slack/events/__init__.py(1 hunks)backend/apps/slack/events/message_posted.py(1 hunks)backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py(1 hunks)backend/apps/slack/models/conversation.py(1 hunks)backend/apps/slack/services/__init__.py(1 hunks)backend/apps/slack/services/message_auto_reply.py(1 hunks)backend/apps/slack/templates/commands/ai.jinja(1 hunks)backend/pyproject.toml(1 hunks)backend/settings/base.py(2 hunks)backend/settings/urls.py(1 hunks)backend/test_commands.py(1 hunks)backend/tests/apps/ai/common/extractors/repository_test.py(1 hunks)backend/tests/apps/ai/common/utils_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py(1 hunks)backend/tests/apps/ai/models/chunk_test.py(4 hunks)backend/tests/apps/slack/commands/ai_test.py(1 hunks)backend/tests/apps/slack/common/handlers/ai_test.py(1 hunks)backend/tests/apps/slack/common/question_detector_test.py(1 hunks)backend/tests/apps/slack/events/message_posted_test.py(1 hunks)backend/tests/apps/slack/services/message_auto_reply_test.py(1 hunks)cspell/custom-dict.txt(2 hunks)docker-compose/local.yaml(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (20)
backend/apps/slack/apps.py (2)
backend/apps/mentorship/apps.py (1)
ready(12-14)backend/apps/slack/events/__init__.py (1)
configure_slack_events(1-19)
backend/apps/slack/common/handlers/ai.py (2)
backend/apps/ai/agent/tools/rag/rag_tool.py (2)
RagTool(13-63)query(34-63)backend/apps/slack/blocks.py (1)
markdown(21-34)
backend/tests/apps/slack/common/handlers/ai_test.py (2)
backend/apps/slack/common/handlers/ai.py (2)
get_error_blocks(49-61)process_ai_query(31-46)backend/apps/ai/agent/tools/rag/rag_tool.py (1)
query(34-63)
backend/apps/ai/management/commands/ai_update_repository_context.py (4)
backend/apps/ai/common/base/context_command.py (1)
BaseContextCommand(9-54)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_markdown_content(114-174)backend/apps/github/models/repository.py (1)
Repository(25-349)backend/apps/ai/management/commands/ai_update_repository_chunks.py (5)
Command(10-36)extract_content(14-16)get_base_queryset(18-28)get_default_queryset(30-32)source_name(34-36)
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command(10-36)source_name(34-36)extract_content(14-16)get_base_queryset(18-28)get_default_queryset(30-32)backend/apps/ai/common/base/context_command.py (1)
BaseContextCommand(9-54)backend/apps/github/models/repository.py (1)
Repository(25-349)
backend/apps/slack/commands/ai.py (2)
backend/apps/slack/commands/command.py (1)
CommandBase(20-157)backend/apps/slack/common/handlers/ai.py (1)
get_blocks(13-28)
backend/apps/slack/events/message_posted.py (7)
backend/apps/slack/common/handlers/ai.py (1)
get_blocks(13-28)backend/apps/slack/common/question_detector.py (2)
QuestionDetector(13-57)is_owasp_question(31-42)backend/apps/slack/events/event.py (1)
EventBase(23-265)backend/apps/slack/models/conversation.py (2)
Conversation(15-107)update_data(85-107)backend/apps/slack/models/member.py (1)
Member(10-78)backend/apps/slack/models/message.py (2)
Message(15-160)text(83-85)backend/apps/slack/services/message_auto_reply.py (1)
generate_ai_reply_if_unanswered(16-48)
backend/apps/slack/common/question_detector.py (1)
backend/apps/slack/models/message.py (1)
text(83-85)
backend/tests/apps/slack/services/message_auto_reply_test.py (4)
backend/apps/slack/models/conversation.py (1)
Conversation(15-107)backend/apps/slack/models/message.py (3)
Message(15-160)text(83-85)ts(88-90)backend/apps/slack/models/workspace.py (2)
Workspace(11-44)bot_token(37-44)backend/apps/slack/services/message_auto_reply.py (1)
generate_ai_reply_if_unanswered(16-48)
backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (5)
Command(10-36)source_name(34-36)extract_content(14-16)get_base_queryset(18-28)get_default_queryset(30-32)backend/apps/ai/common/base/chunk_command.py (1)
BaseChunkCommand(12-91)backend/apps/github/models/repository.py (1)
Repository(25-349)
backend/apps/slack/services/message_auto_reply.py (4)
backend/apps/slack/common/handlers/ai.py (1)
process_ai_query(31-46)backend/apps/slack/models/message.py (3)
Message(15-160)ts(88-90)text(83-85)backend/apps/slack/models/workspace.py (1)
bot_token(37-44)backend/apps/ai/agent/tools/rag/rag_tool.py (1)
query(34-63)
backend/apps/ai/common/extractors/repository.py (1)
backend/apps/github/utils.py (1)
get_repository_file_content(60-79)
backend/apps/slack/events/__init__.py (2)
backend/apps/slack/apps.py (1)
SlackConfig(13-33)backend/apps/slack/events/event.py (2)
EventBase(23-265)configure_events(30-37)
backend/tests/apps/ai/models/chunk_test.py (3)
backend/tests/apps/ai/common/base/chunk_command_test.py (2)
mock_chunks(72-81)mock_context(51-60)backend/apps/ai/models/chunk.py (2)
Chunk(16-93)bulk_save(34-53)backend/tests/apps/ai/common/base/context_command_test.py (1)
mock_context(48-55)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
BaseChunkCommand(12-91)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_markdown_content(114-174)backend/apps/github/models/repository.py (1)
Repository(25-349)backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command(10-36)extract_content(14-16)get_base_queryset(18-28)get_default_queryset(30-32)source_name(34-36)
backend/tests/apps/slack/common/question_detector_test.py (2)
backend/apps/slack/common/question_detector.py (4)
QuestionDetector(13-57)is_question(44-46)contains_owasp_keywords(48-57)is_owasp_question(31-42)backend/tests/apps/slack/events/message_posted_test.py (1)
test_init(44-48)
backend/tests/apps/slack/events/message_posted_test.py (3)
backend/apps/slack/events/message_posted.py (1)
MessagePosted(17-77)backend/apps/slack/models/conversation.py (1)
Conversation(15-107)backend/apps/slack/models/member.py (1)
Member(10-78)
backend/apps/ai/models/chunk.py (3)
backend/apps/ai/models/context.py (1)
Context(15-61)backend/apps/common/models.py (2)
BulkSaveModel(10-34)TimestampedModel(37-46)backend/apps/common/utils.py (1)
truncate(164-176)
backend/tests/apps/slack/commands/ai_test.py (2)
backend/apps/slack/commands/ai.py (1)
Ai(6-23)backend/apps/slack/commands/command.py (1)
CommandBase(20-157)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (2)
extract_repository_content(11-111)extract_repository_markdown_content(114-174)
🔇 Additional comments (26)
backend/pyproject.toml (1)
22-22: Manually verify django-rq compatibility with Python 3.13 and Django 5.1
Poetry wasn’t available to run the compatibility checks; please confirm at runtime that django-rq 3.1 (and its rq/redis transitive dependencies) works under Python 3.13 and Django 5.1, and pin rq/redis in pyproject.toml if needed.cspell/custom-dict.txt (1)
109-109: Dictionary additions OK“rqworker” and “webgoat” entries make sense for this repo’s domain.
Also applies to: 125-125
backend/apps/slack/services/__init__.py (1)
1-1: Package initializer LGTMClear, minimal package docstring. No issues.
backend/settings/urls.py (1)
31-31: Django-RQ dashboard is staff-only by default
Unconditionalpath("django-rq/", include("django_rq.urls"))is safe since django-rq v3.1+ enforces staff-only access on its dashboard views.backend/settings/base.py (1)
52-53: Django RQ app registration looks correct.Adding "django_rq" to THIRD_PARTY_APPS is appropriate and consistent with how INSTALLED_APPS is composed.
backend/tests/apps/ai/common/utils_test.py (1)
97-107: Decorator order and side-effect wiring are correct.Patching order matches argument order; the AttributeError simulates the None-context path as intended.
backend/apps/ai/models/chunk.py (2)
34-53: No downstream code depends onchunksbeing cleared.Chunk.bulk_saveis only called inchunk_command.py(where the list isn’t used after the call) and in tests (which don’t assert on the list), so the updated clear-only-when-empty logic doesn’t break any callers.
19-23: Replace deprecatedunique_togetherwith aUniqueConstraintand plan a safe migrationclass Meta: db_table = "ai_chunks" verbose_name = "Chunk" - unique_together = ("context", "text") + constraints = [ + models.UniqueConstraint( + fields=("context", "text"), + name="uq_ai_chunks_context_text", + ) + ]Add a data migration to dedupe existing rows before enforcing the constraint and use
CREATE INDEX CONCURRENTLYon Postgres to avoid downtime.backend/tests/apps/ai/models/chunk_test.py (1)
101-104: LGTM on direct context-based existence checks.The assertions correctly verify the switch to context=mock_context, text=text.
Also applies to: 125-128, 148-150
backend/apps/ai/common/extractors/repository.py (1)
108-111: LGTM on structured repository metadata extraction.The prose/metadata split and DELIMITER usage align with the RAG pipeline.
backend/apps/slack/commands/__init__.py (1)
5-5: LGTM: /ai command is now exported/loadedImporting ai ensures command discovery via CommandBase.configure_commands().
backend/apps/slack/models/conversation.py (1)
30-32: LGTM: opt-in feature flag on ConversationGood that from_slack() does not overwrite this local flag.
backend/apps/ai/management/commands/ai_update_repository_context.py (1)
30-32: Upstream get_base_queryset has no is_active filter BaseAICommand.get_base_queryset returns .objects.all() with no implicit filters, so overriding get_default_queryset correctly avoids filtering on the nonexistent is_active field.backend/apps/slack/MANIFEST.yaml (1)
98-103: /ai slash command looks correct.URL, usage, and escaping align with the existing pattern. Ensure the handler responds within 3s to Slack’s slash command ACK.
backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py (1)
1-1: Django version compatibility confirmed
The project’spyproject.tomldeclaresdjango = "^5.1"(i.e. ≥5.1,<6.0), which includes Django 5.2.5, so the migration header is consistent with the dependency range.backend/apps/ai/management/commands/ai_update_repository_chunks.py (2)
14-17: Good: delegates extraction cleanly.
34-36: Consistent source_name.Matches context command; good for downstream filters.
backend/tests/apps/slack/services/message_auto_reply_test.py (1)
17-43: Fixtures are clean and focused. LGTM.backend/tests/apps/slack/common/question_detector_test.py (2)
170-186: Good real-world coverageNice parameterized cases that exercise diverse OWASP queries.
188-196: Initialization patching looks solidMocking
OWASP_KEYWORDSverifies configurability without leaking global state.backend/tests/apps/slack/commands/ai_test.py (2)
18-42: Patch target and whitespace trimming behavior look correctVerifies the local import usage and exact query passed to
get_blocks.
127-140: Exception propagation test is appropriateConfirms
render_blocksdoes not swallow errors, leaving caller to handle.backend/tests/apps/ai/common/extractors/repository_test.py (1)
241-256: Baseline behavior verifiedDescription is included; metadata intentionally empty for markdown extraction. Looks good.
backend/tests/apps/slack/events/message_posted_test.py (3)
95-123: Bot-mention flow is well coveredAuth caching, mention stripping, and block rendering checks are solid.
210-292: OWASP question happy-path coverage looks goodValidates DB updates and delayed enqueue to the default queue.
348-375: Whitespace-trim behavior verifiedEnsures query cleanup before rendering blocks. Nice.
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
Show resolved
Hide resolved
cspell/custom-dict.txt
Outdated
| pypoetry | ||
| pyyaml | ||
| repositorycontributor | ||
| repositorys |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Don’t whitelist the misspelling “repositorys”
This risks masking user-facing typos. Prefer fixing sources to “repositories”. If you must keep it temporarily for tests, add a TODO and plan removal.
Apply if ready to correct:
-repositorys
+repositoriesTo find any user-visible occurrences:
🏁 Script executed:
#!/bin/bash
rg -nP -C2 '\brepositorys\b' --glob '!*cspell*'Length of output: 2363
Implement correct pluralization and remove whitelist entry
Update the entity_name_plural logic to return “repositories” (instead of naïvely appending “s”), update the two test files under backend/tests/apps/ai/management/commands/ where they assert "repositorys" to expect "repositories", then remove the “repositorys” entry from cspell/custom-dict.txt (or add a TODO if you need a temporary fallback).
🤖 Prompt for AI Agents
In cspell/custom-dict.txt around line 107 the entry "repositorys" exists but is
the incorrect plural; update code to make entity_name_plural return
"repositories" (not by naive append "s"), update the two tests in
backend/tests/apps/ai/management/commands/ to expect "repositories" instead of
"repositorys", and then remove the "repositorys" entry from
cspell/custom-dict.txt (or replace it with a TODO comment if you need a
temporary fallback).
There was a problem hiding this comment.
@Dishant1804 this is legit suggestion, it's grammatically incorrect and needs to be fixed.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
6439df7 to
b198ecf
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/apps/ai/models/chunk.py (1)
85-93: update_data has a race on duplicates; rely on the DB constraintThe pre-check can race; catch IntegrityError instead and return None.
- if Chunk.objects.filter(context=context, text=text).exists(): - return None - - chunk = Chunk(text=text, embedding=embedding, context=context) - - if save: - chunk.save() - - return chunk + chunk = Chunk(text=text, embedding=embedding, context=context) + if save: + try: + chunk.save() + except Exception as exc: # narrowed below via local import + from django.db import IntegrityError + if isinstance(exc, IntegrityError): + # Duplicate per unique (context, text); treat as no-op + logger.debug("Chunk already exists for context_id=%s; skipping", getattr(context, "id", None)) + return None + raise + return chunkIf you prefer top-level imports, add:
-from django.db import DatabaseError, models +from django.db import DatabaseError, IntegrityError, models
♻️ Duplicate comments (2)
backend/apps/ai/models/chunk.py (1)
35-53: bulk_save drops updates, does N+1 queries, and breaks the clear-list contract
- Updates are filtered out (ids present) if an identical (context, text) exists.
- Per-object .exists() causes O(N) queries.
- Passing a new list to BulkSaveModel means the caller’s list isn’t cleared unless no uniques remain.
- Duplicate races aren’t handled; rely on the DB unique constraint and catch IntegrityError.
Replace the body with a single-probe, race-safe implementation that preserves the contract.
@staticmethod def bulk_save(chunks, fields=None): - """Bulk save chunks with duplicate handling.""" - unique_chunks = [] - unique_chunks = [ - chunk - for chunk in chunks - if not Chunk.objects.filter(context=chunk.context, text=chunk.text).exists() - ] - - if unique_chunks: - try: - BulkSaveModel.bulk_save(Chunk, unique_chunks, fields=fields) - except (ValueError, TypeError, DatabaseError): - for chunk in unique_chunks: - try: - chunk.save() - except (ValueError, TypeError, DatabaseError) as save_error: - logger.exception("Failed to save chunk", exc_info=save_error) - else: - chunks.clear() + """Bulk save chunks with duplicate handling and minimal queries.""" + if not chunks: + return + + # Partition: updates vs. creates + to_update = [c for c in chunks if getattr(c, "id", None)] + to_create = [c for c in chunks if not getattr(c, "id", None)] + + # In-memory dedupe among creates by (context_id, text) + seen: set[tuple[int | None, str]] = set() + deduped_create = [] + for c in to_create: + ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) + key = (ctx_id, c.text) + if key in seen: + continue + seen.add(key) + deduped_create.append(c) + + # One DB probe for existing keys among incoming creates + ctx_ids = {getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) for c in deduped_create} + texts = {c.text for c in deduped_create} + existing_keys: set[tuple[int, str]] = set() + if ctx_ids and texts: + existing_keys = set( + Chunk.objects.filter(context_id__in=ctx_ids, text__in=texts) + .values_list("context_id", "text") + ) + + final_create = [] + for c in deduped_create: + ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) + if (ctx_id, c.text) not in existing_keys: + final_create.append(c) + + unique_chunks = to_update + final_create + if not unique_chunks: + chunks.clear() + return + + try: + # Maintain contract: operate on caller list so BulkSaveModel clears it + chunks[:] = unique_chunks + BulkSaveModel.bulk_save(Chunk, chunks, fields=fields) + except (ValueError, TypeError, DatabaseError) as exc: + # Fallback: per-object saves with race-safe duplicate handling + from django.db import IntegrityError # local import to avoid global if unused + for obj in unique_chunks: + try: + if getattr(obj, "id", None) and fields: + obj.save(update_fields=fields) + else: + obj.save() + except IntegrityError: + # Duplicate create due to race/uniqueness; ignore + continue + except (ValueError, TypeError, DatabaseError): + logger.exception("Failed to save chunk id=%s", getattr(obj, "id", None))backend/tests/apps/ai/common/extractors/repository_test.py (1)
340-354: Add a test for generic exceptions from the fetcherPast feedback asked to catch generic Exception in the extractor; add a case that raises Exception to ensure we don’t break on unexpected errors.
@patch("apps.ai.common.extractors.repository.get_repository_file_content") def test_extract_repository_markdown_content_file_fetch_exception(self, mock_get_content): """Test extraction when file fetching raises an exception.""" repository = MagicMock() repository.description = "Test repository" repository.organization = MagicMock(login="test-org") repository.key = "test-repo" repository.default_branch = "main" - mock_get_content.side_effect = ConnectionError("Network error") + mock_get_content.side_effect = Exception("Network error") prose, metadata = extract_repository_markdown_content(repository) assert "Repository Description: Test repository" in prose assert metadata == ""
🧹 Nitpick comments (5)
backend/apps/ai/models/chunk.py (2)
50-51: Use logger.exception correctlylogger.exception already includes traceback; passing exc_info object is unnecessary and atypical.
- logger.exception("Failed to save chunk", exc_info=save_error) + logger.exception("Failed to save chunk id=%s", getattr(chunk, "id", None))
19-23: Prefer UniqueConstraint over unique_togetherDjango now favors explicit UniqueConstraint with a name; easier to reference in migrations and logs.
- unique_together = ("context", "text") + constraints = [ + models.UniqueConstraint( + fields=["context", "text"], + name="uniq_ai_chunk_context_text", + ) + ]backend/apps/ai/management/commands/ai_update_repository_chunks.py (1)
18-28: Optimize queryset for extractor accessSelect related FKs to avoid extra queries when building raw URLs (owner/org), and narrow fields if desired.
return ( super() .get_base_queryset() .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) - ) + ).select_related("organization", "owner")backend/tests/apps/ai/common/extractors/repository_test.py (2)
407-420: Assert no network calls when owner/org missingWhen both are absent, extractor should not attempt fetches. Patch and assert not called.
- def test_extract_repository_markdown_content_no_owner_or_org(self): + @patch("apps.ai.common.extractors.repository.get_repository_file_content") + def test_extract_repository_markdown_content_no_owner_or_org(self, mock_get_content): """Test extraction when repository has neither organization nor owner.""" repository = MagicMock() repository.description = "Test repository" repository.organization = None repository.owner = None repository.key = "test-repo" repository.default_branch = "main" prose, metadata = extract_repository_markdown_content(repository) assert "Repository Description: Test repository" in prose assert metadata == "" + mock_get_content.assert_not_called()
421-433: Also assert no network calls when key is missingKey is required to form the raw URL; ensure no fetch attempted.
- def test_extract_repository_markdown_content_no_key(self): + @patch("apps.ai.common.extractors.repository.get_repository_file_content") + def test_extract_repository_markdown_content_no_key(self, mock_get_content): """Test extraction when repository has no key.""" repository = MagicMock() repository.description = "Test repository" repository.organization = MagicMock(login="test-org") repository.key = None repository.default_branch = "main" prose, metadata = extract_repository_markdown_content(repository) assert "Repository Description: Test repository" in prose assert metadata == "" + mock_get_content.assert_not_called()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
backend/apps/ai/Makefile(1 hunks)backend/apps/ai/common/extractors/repository.py(1 hunks)backend/apps/ai/management/commands/ai_update_repository_chunks.py(1 hunks)backend/apps/ai/management/commands/ai_update_repository_context.py(1 hunks)backend/apps/ai/models/chunk.py(3 hunks)backend/test_commands.py(1 hunks)backend/tests/apps/ai/common/extractors/repository_test.py(1 hunks)backend/tests/apps/ai/common/utils_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py(1 hunks)backend/tests/apps/ai/models/chunk_test.py(4 hunks)cspell/custom-dict.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- backend/test_commands.py
- backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
- backend/tests/apps/ai/models/chunk_test.py
- backend/apps/ai/management/commands/ai_update_repository_context.py
- backend/apps/ai/common/extractors/repository.py
- cspell/custom-dict.txt
- backend/apps/ai/Makefile
- backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
- backend/tests/apps/ai/common/utils_test.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (2)
extract_repository_content(11-111)extract_repository_markdown_content(114-174)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
BaseChunkCommand(12-91)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_markdown_content(114-174)backend/apps/github/models/repository.py (1)
Repository(25-349)backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command(10-36)extract_content(14-16)get_base_queryset(18-28)get_default_queryset(30-32)source_name(34-36)
backend/apps/ai/models/chunk.py (2)
backend/apps/ai/models/context.py (1)
Context(15-61)backend/apps/common/models.py (2)
BulkSaveModel(10-34)TimestampedModel(37-46)
🔇 Additional comments (5)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
10-12: LGTM: command wiring is correctkey_field_name and model_class look right.
14-16: LGTM: delegates to the markdown extractorExtraction path aligns with the PR goals.
30-36: LGTM: default queryset override avoids is_activeRepository lacks is_active; good override.
18-28: Verified OWASP site repository filter useswww-prefixes
check_owasp_site_repositorykeys offkey.startswith(("www-chapter-", "www-committee-", "www-event", "www-project-")), so the queryset filter aligns with the intendedwww-heuristic.backend/tests/apps/ai/common/extractors/repository_test.py (1)
3-3: CI/python requirements support datetime.UTC
Project’spyproject.tomlspecifiespython = "^3.13"(>= 3.13), which includes thedatetime.UTCalias introduced in Python 3.11—no action needed.
arkid15r
left a comment
There was a problem hiding this comment.
@Dishant1804 you've submitted 1000+ lines of code for review.
I have a couple of questions:
- could you explain what OWASP Nest problem this PR solves?
- how did you verify that problem is solved with this PR?
this PR gets the .md files that we have in www- repositories and saves it chunks and context so that we can have more knowledge |
Alright, so it sound like you checked it with at least one repository and some relevant queries. Can you provide some examples? |
this has a lot of chunks that are not up to the mark, maybe we should change the chunking delimiter or chunking strategy for this case, because it has lot of delimiters |
I meant your general approach for testing, e.g. did OWASP Nest repository content help to improve NestBot answers/context for any specific questions? Did it get new info based on this? My opinion -- it didn't. I haven't tested it yet but based on this code I can reasonably assume that no new [useful] content was added by these jobs and PR in general. Also I reasonably assume that this is an AI generated code that has nothing to do with real life scenario. I call it a cosplay-code. The bottom line is that code has 0 value in its current form but you requested mentors to review it w/o testing. I don't appreciate this kind of disrespect to mentors' time. This is not acceptable. |
I'll update the chunking startegy for this and then request for review, we are getting the information from readme and contributing .md files, the chunking needs to be proper |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
backend/apps/ai/models/chunk.py (1)
35-53: bulk_save drops updates, does N+1 queries, and breaks caller-list clearing contract
- Updates are filtered out (self-existence check catches the same row).
- Per-object
.exists()→ O(N) queries.- Caller’s
chunkslist isn’t cleared on success (only when no uniques).- Fallback logging misuses
logger.exceptionargs.Refactor as below to partition update/create, dedupe creates in-memory, probe DB once, maintain contract, and handle IntegrityError.
@staticmethod def bulk_save(chunks, fields=None): - """Bulk save chunks with duplicate handling.""" - unique_chunks = [] - unique_chunks = [ - chunk - for chunk in chunks - if not Chunk.objects.filter(context=chunk.context, text=chunk.text).exists() - ] - - if unique_chunks: - try: - BulkSaveModel.bulk_save(Chunk, unique_chunks, fields=fields) - except (ValueError, TypeError, DatabaseError): - for chunk in unique_chunks: - try: - chunk.save() - except (ValueError, TypeError, DatabaseError) as save_error: - logger.exception("Failed to save chunk", exc_info=save_error) - else: - chunks.clear() + """Bulk save chunks with duplicate handling and minimal queries.""" + if not chunks: + return + + # Partition: updates (have id) vs creates (no id) + to_update = [c for c in chunks if getattr(c, "id", None)] + to_create = [c for c in chunks if not getattr(c, "id", None)] + + # In-memory dedupe among creates by (context_id, text) + seen: set[tuple[int | None, str]] = set() + deduped_create = [] + for c in to_create: + ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) + key = (ctx_id, c.text) + if key in seen: + continue + seen.add(key) + deduped_create.append(c) + + # One DB round-trip to find existing keys + ctx_ids = {getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) for c in deduped_create} + texts = {c.text for c in deduped_create} + existing_keys: set[tuple[int, str]] = set() + if ctx_ids and texts: + existing_keys = set( + Chunk.objects.filter(context_id__in=ctx_ids, text__in=texts) + .values_list("context_id", "text") + ) + + final_create = [ + c for c in deduped_create + if ((getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None)), c.text) not in existing_keys + ] + + unique_chunks = to_update + final_create + if not unique_chunks: + chunks.clear() + return + + try: + # Maintain contract: operate on caller's list and let BulkSaveModel clear it + chunks[:] = unique_chunks + BulkSaveModel.bulk_save(Chunk, chunks, fields=fields) + except (ValueError, TypeError, DatabaseError): + # Fallback to per-object saves; ignore duplicate races + from django.db import IntegrityError # local import + for obj in unique_chunks: + try: + if getattr(obj, "id", None) and fields: + obj.save(update_fields=fields) + else: + obj.save() + except IntegrityError: + continue + except (ValueError, TypeError, DatabaseError): + logger.exception("Failed to save chunk id=%s", getattr(obj, "id", None))backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (3)
45-47: Fix plural: "repositorys" → "repositories".This is user-facing and appears in multiple assertions.
Apply:
- assert command.entity_name_plural == "repositorys" + assert command.entity_name_plural == "repositories"
120-126: Fix plural in initialization assertions as well.Keep tests consistent with correct English plural.
- assert command.entity_name_plural == "repositorys" + assert command.entity_name_plural == "repositories"
41-47: Define entity_name and entity_name_plural on the command class.Prevents accidental defaults and ensures help/log strings are correct.
Change in apps/ai/management/commands/ai_update_repository_context.py:
class Command(BaseContextCommand): + entity_name = "repository" + entity_name_plural = "repositories" key_field_name = "key" model_class = Repository
🧹 Nitpick comments (14)
backend/apps/ai/common/utils.py (1)
74-89: Avoid full JSON parse for detection; add fast guards
json.loadson large strings is costly. Short-circuit on type and first non-space char before parsing; then parse trimmed string.def is_json_content(content: str) -> bool: @@ - try: - json.loads(content) + if not isinstance(content, str): + return False + s = content.lstrip() + if not s or s[0] not in ("{", "["): + return False + try: + json.loads(s)backend/apps/ai/models/chunk.py (3)
59-63: Make splitter markdown-aware for better chunk boundariesInclude heading-based separators to reduce mid-heading splits and improve retrieval quality.
- separators=["\n\n", "\n", " ", ""], + separators=["\n\n", "\n### ", "\n## ", "\n# ", "\n", " ", ""],
13-13: Type-hint logger for consistencyMinor consistency nit with utils.py.
-logger = logging.getLogger(__name__) +logger: logging.Logger = logging.getLogger(__name__)
85-93: Race-safe create in update_dataPotential duplicate race against the unique constraint; catch IntegrityError and return None to keep the contract.
chunk = Chunk(text=text, embedding=embedding, context=context) if save: - chunk.save() + from django.db import IntegrityError # local import + try: + chunk.save() + except IntegrityError: + return None return chunkbackend/apps/ai/common/base/chunk_command.py (1)
46-53: Embedding raw JSON hurts recall; include metadata prefix even for JSONKeep human-readable context to improve retrieval without discarding structured data.
- if is_json_content(content): - full_content = content + if is_json_content(content): + # Keep metadata for NL recall; JSON retained for structure. + full_content = f"{metadata_content}\n\n{content}" if metadata_content else content else:backend/apps/ai/management/commands/ai_update_repository_context.py (1)
18-28: Select related FKs to avoid N+1 during extraction
extract_repository_contentaccessesorganization.loginandowner.login. Prefetch them to reduce queries.return ( super() .get_base_queryset() + .select_related("organization", "owner") .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) )backend/apps/ai/management/commands/ai_update_repository_chunks.py (2)
18-28: Select related FKs to avoid N+1 during extractionSame FK access pattern as the context command; apply here too.
return ( super() .get_base_queryset() + .select_related("organization", "owner") .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) )
34-36: Docstring nit: this is chunk creation, not context creationTiny wording fix for clarity.
- """Return the source name for context creation.""" + """Return the source name for chunk creation."""backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (1)
67-76: Use a stable patch target for the base queryset.Patching via bases[0] is brittle. Patch the concrete base class method instead.
- with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super: + from apps.ai.common.base.context_command import BaseContextCommand + with patch.object(BaseContextCommand, "get_base_queryset") as mock_super:Replicate in similar tests below.
backend/apps/ai/common/extractors/repository.py (2)
116-121: Broaden markdown coverage to common docs used by OWASP repos.Capturing CONTRIBUTING/CODE_OF_CONDUCT/SECURITY increases useful context for NestBot.
markdown_files = [ "README.md", "index.md", "info.md", "leaders.md", + "CONTRIBUTING.md", + "CODE_OF_CONDUCT.md", + "SECURITY.md", ]
13-22: Add parameter type hint (non-blocking).Helps IDEs/static analyzers without importing Django models at runtime.
-from apps.ai.common.constants import DELIMITER +from apps.ai.common.constants import DELIMITER +from typing import TYPE_CHECKING +if TYPE_CHECKING: + from apps.github.models.repository import Repository @@ -def extract_repository_content(repository) -> tuple[str, str]: +def extract_repository_content(repository: "Repository") -> tuple[str, str]:backend/tests/apps/ai/common/extractors/repository_test.py (3)
1-11: Patch out sleep to keep tests fast.If the extractor still sleeps, make tests autouse-patch it.
+import pytest @@ +@pytest.fixture(autouse=True) +def _no_sleep(monkeypatch): + monkeypatch.setattr("apps.ai.common.extractors.repository.time.sleep", lambda *_: None)
498-501: Assert on called URL args, not stringified Call objects.More precise and less brittle.
- mock_get_content.assert_called() - assert any( - "https://raw.githubusercontent.com/test-org/test-repo/develop/" in str(call) - for call in mock_get_content.call_args_list - ) + urls = [args[0] for args, _ in (c for c in mock_get_content.call_args_list)] + assert any( + url.startswith("https://raw.githubusercontent.com/test-org/test-repo/develop/") + for url in urls + )
358-399: If you add more markdown files in the extractor, extend this test accordingly.Add assertions for CONTRIBUTING.md/CODE_OF_CONDUCT.md/SECURITY.md when you broaden the list.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
backend/apps/ai/common/base/chunk_command.py(2 hunks)backend/apps/ai/common/extractors/repository.py(1 hunks)backend/apps/ai/common/utils.py(2 hunks)backend/apps/ai/management/commands/ai_update_repository_chunks.py(1 hunks)backend/apps/ai/management/commands/ai_update_repository_context.py(1 hunks)backend/apps/ai/models/chunk.py(3 hunks)backend/tests/apps/ai/common/extractors/repository_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
🧰 Additional context used
🧬 Code graph analysis (7)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (1)
extract_repository_content(13-164)
backend/apps/ai/common/extractors/repository.py (1)
backend/apps/github/utils.py (1)
get_repository_file_content(60-79)
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command(10-36)source_name(34-36)extract_content(14-16)get_base_queryset(18-28)get_default_queryset(30-32)backend/apps/ai/common/base/context_command.py (1)
BaseContextCommand(9-54)backend/apps/github/models/repository.py (1)
Repository(25-349)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
BaseChunkCommand(12-95)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_content(13-164)backend/apps/github/models/repository.py (1)
Repository(25-349)backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command(10-36)extract_content(14-16)get_base_queryset(18-28)get_default_queryset(30-32)source_name(34-36)
backend/apps/ai/models/chunk.py (3)
backend/apps/ai/models/context.py (1)
Context(15-61)backend/apps/common/models.py (2)
BulkSaveModel(10-34)TimestampedModel(37-46)backend/apps/common/utils.py (1)
truncate(164-176)
backend/apps/ai/common/base/chunk_command.py (3)
backend/apps/ai/common/utils.py (2)
create_chunks_and_embeddings(20-71)is_json_content(74-88)backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (1)
extract_content(32-34)backend/tests/apps/ai/common/base/chunk_command_test.py (1)
extract_content(23-25)
backend/apps/ai/management/commands/ai_update_repository_context.py (3)
backend/apps/ai/common/base/context_command.py (1)
BaseContextCommand(9-54)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_content(13-164)backend/apps/github/models/repository.py (1)
Repository(25-349)
🔇 Additional comments (5)
backend/apps/ai/common/utils.py (1)
3-3: Import looks goodbackend/apps/ai/common/base/chunk_command.py (1)
7-7: Import looks goodbackend/apps/ai/management/commands/ai_update_repository_context.py (1)
10-17: Command structure and delegation look solidbackend/apps/ai/management/commands/ai_update_repository_chunks.py (1)
10-17: Command wiring LGTMbackend/apps/ai/models/chunk.py (1)
19-23: DB uniqueness added — migration exists; confirm it's appliedbackend/apps/ai/migrations/0005_context_alter_chunk_unique_together_chunk_context_and_more.py contains AlterUniqueTogether setting unique_together={('context','text')}; initial migration 0001 used unique_together={('message','chunk_text')}. Ensure the 0005 migration is included in your release and applied to the target database.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/apps/ai/common/extractors/repository.py (1)
123-144: Remove per‑file sleeps; pass a short timeout and log with context.The hard delay slows batch runs/tests and doesn’t guarantee politeness. Rely on request timeouts and contextual logging; manage rate limiting at the caller or via a shared limiter.
@@ - for file_path in markdown_files: - try: + for file_path in markdown_files: + try: if repository.organization: owner = repository.organization.login else: owner = repository.owner.login if repository.owner else "" @@ - content = get_repository_file_content(raw_url) + content = get_repository_file_content(raw_url, timeout=10) if content and content.strip(): markdown_content[file_path] = content - time.sleep(GITHUB_REQUEST_INTERVAL_SECONDS) - except (ValueError, TypeError, OSError): - logger.debug("Failed to fetch markdown file") + except Exception as exc: + logger.debug( + "Failed to fetch %s for %s/%s (branch=%s): %s", + file_path, + owner or "(unknown)", + getattr(repository, "key", None) or "(no-key)", + getattr(repository, "default_branch", None) or "main", + exc, + ) continueIf throttling is still desired, move it to the management command (single place), or apply only between successful network attempts with jitter.
🧹 Nitpick comments (2)
backend/apps/ai/common/extractors/repository.py (1)
116-121: Expand Markdown targets or discover dynamically.Limiting to four filenames will miss common docs (CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, GOVERNANCE.md, README.* variants). Consider a discovery pass (list root .md files) or extend this list.
Example extension (quick win):
markdown_files = [ "README.md", "index.md", "info.md", "leaders.md", + "CONTRIBUTING.md", + "CODE_OF_CONDUCT.md", + "SECURITY.md", + "GOVERNANCE.md", + "README.MD", + "README.markdown", ]backend/apps/ai/common/constants.py (1)
8-8: Clarify GitHub throttle vs global MIN; avoid conflicting semantics.MIN_REQUEST_INTERVAL_SECONDS = 1.2 is used in backend/apps/ai/common/utils.py (OpenAI embeddings throttling). GITHUB_REQUEST_INTERVAL_SECONDS = 0.5 is used in backend/apps/ai/common/extractors/repository.py (time.sleep after get_repository_file_content). Either document why GitHub raw fetches use a lower delay, or rename to GITHUB_RAW_FETCH_DELAY_SECONDS and add a comment, and/or align the values if 0.5 is unintended.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/apps/ai/common/constants.py(1 hunks)backend/apps/ai/common/extractors/repository.py(1 hunks)backend/apps/ai/common/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/apps/ai/common/utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/apps/ai/common/extractors/repository.py (2)
backend/apps/github/utils.py (1)
get_repository_file_content(60-79)backend/apps/github/api/internal/queries/repository.py (1)
repository(14-35)
🔇 Additional comments (2)
backend/apps/ai/common/extractors/repository.py (2)
33-34: No change required —repository.licenseis already a string.backend/apps/github/models/repository.py declares
license = models.CharField(...)(line 44) and setsself.license = gh_repository.license.spdx_idon import (line 272); tests and other code treat it as a string. The extractor assignment is safe.Likely an incorrect or invalid review comment.
97-101: No timezone fix required — repository datetimes are UTC-aware; keep .strftime('%Y-%m-%d').Models declare created_at/updated_at/pushed_at as DateTimeField and tests/migrations supply tz-aware datetimes (tzinfo=UTC / datetime.UTC), so the current formatting is safe. See backend/apps/github/models/repository.py and backend/tests/apps/ai/common/extractors/repository_test.py.
arkid15r
left a comment
There was a problem hiding this comment.
It's still too far from what can be accepted for merge even into a feature branch:
| if ownership: | ||
| repository_data["ownership"] = ownership | ||
|
|
||
| if repository.track_issues: |
There was a problem hiding this comment.
This is an internal field. Why including this?
| if repository.track_issues: | ||
| repository_data["track_issues"] = repository.track_issues | ||
|
|
||
| markdown_files = [ |
There was a problem hiding this comment.
I mentioned we also need tab_ prefixed ones for all entities, see
backend/apps/ai/common/utils.py
Outdated
| return chunks | ||
|
|
||
|
|
||
| def is_json_content(content: str) -> bool: |
There was a problem hiding this comment.
This is a wider level common utils function, not just ai app. Also rename it to something like is_valid_json, not sure if there already exists ready for reuse code.
There was a problem hiding this comment.
it was not there -- for future usage I have added it to the common/utils.py
backend/apps/ai/models/chunk.py
Outdated
| unique_chunks = [] | ||
| unique_chunks = [ |
backend/apps/ai/models/chunk.py
Outdated
|
|
||
| if unique_chunks: | ||
| try: | ||
| BulkSaveModel.bulk_save(Chunk, unique_chunks, fields=fields) |
There was a problem hiding this comment.
In previous review rounds we had a long discussion on how we use bulk_save and update_data methods within the code base. And now you just changing it based on your own vision again?
This is not a bulk save when you save each item separately. Why all these exception handling here -- I suggest starting from addressing this one.
There was a problem hiding this comment.
yes we had a discussion and based on that I reverted the changes -- will keep this in mind
cspell/custom-dict.txt
Outdated
| pypoetry | ||
| pyyaml | ||
| repositorycontributor | ||
| repositorys |
There was a problem hiding this comment.
@Dishant1804 this is legit suggestion, it's grammatically incorrect and needs to be fixed.
|
for the plural part I am thinking to add a method that handles the words ending with 'y to 'ies' @arkid15r i think this is a better approach, let me know if we want to do it -- I checked the utils and we dont have method that pluralizes the word |
we can ignore sonar qube in this case because we are following the same practice as other command files |
arkid15r
left a comment
There was a problem hiding this comment.
Please address @github-advanced-security's suggestions
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
backend/apps/ai/common/extractors/repository.py (3)
3-7: Remove hard sleep; rely on request timeouts and logging.Avoid slowing batch runs/tests with per-file sleeps. You already have a timeout-enabled fetch.
-import time
145-152: Use per-request timeout and drop sleep.Pass an explicit short timeout; remove the sleep.
- content = get_repository_file_content(raw_url) + content = get_repository_file_content(raw_url, timeout=10) @@ - if content and content.strip(): + if content and content.strip(): markdown_content[file_path] = content - time.sleep(GITHUB_REQUEST_INTERVAL_SECONDS)
154-156: Broaden exception handling and log rich context.Network/HTTP libraries can raise diverse exceptions. Log with repo/owner/branch/file for triage.
- except (ValueError, TypeError, OSError): - logger.debug("Failed to fetch markdown file") + except Exception as exc: + logger.debug( + "Failed to fetch %s for %s/%s (branch %s): %s", + file_path, + owner or "(unknown owner)", + repository.key or "(no key)", + branch, + exc, + ) continue
🧹 Nitpick comments (6)
backend/apps/common/utils.py (1)
106-121: Clarify JSON validity semantics (primitives also pass).As written,
"42",true, or"foo"will return True. If that’s intended, please document it; if not, gate on objects/arrays.Apply one of the following:
Option A — clarify docstring:
def is_valid_json(content: str) -> bool: - """Check if content is JSON format. + """Check if content is JSON format. + + Note: primitive JSON values (numbers, strings, booleans, null) also return True. ...Option B — restrict to objects/arrays:
def is_valid_json(content: str) -> bool: @@ - try: - json.loads(content) + try: + obj = json.loads(content) except (TypeError, ValueError): return False - return True + return isinstance(obj, (dict, list))backend/apps/ai/common/extractors/repository.py (2)
76-93: Consider preserving zero-valued stats.Current truthiness checks drop zeros (e.g., forks=0). If consumers compare repos, missing keys vs zero may be ambiguous. Prefer “is not None”.
Example:
- if repository.forks_count: + if repository.forks_count is not None: stats["forks"] = repository.forks_countApply similarly to other counts.
34-37: License/topic types may be non-serializable — coerce or sanitize.If
license/topicsaren’t plain strings/lists (e.g., objects),json.dumpswill fail. Either coerce to strings or filter serializable types.backend/tests/apps/ai/common/extractors/repository_test.py (3)
420-425: Fix CodeQL “incomplete URL substring sanitization” warning in tests.Use a strict prefix check to avoid substring-based URL logic.
- def side_effect(url): - if "raw.githubusercontent.com" in url: + def side_effect(url): + if str(url).startswith("https://raw.githubusercontent.com/"): msg = "Network error" raise ValueError(msg) return "[]"
573-579: Fix second instance of substring URL check.Same rationale as above.
- def side_effect(url): - if "raw.githubusercontent.com" in url: + def side_effect(url): + if str(url).startswith("https://raw.githubusercontent.com/"): msg = "Test exception" raise ValueError(msg) return "[]"
261-314: Add coverage for tab_*.md discovery.No test verifies tab file enumeration from the contents API. Add one where contents API returns a JSON list including
tab_example.mdand assert it’s fetched.I can draft the test if you confirm the expected behavior for nested vs root-level tab files.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
backend/apps/ai/Makefile(1 hunks)backend/apps/ai/common/base/chunk_command.py(2 hunks)backend/apps/ai/common/constants.py(1 hunks)backend/apps/ai/common/extractors/repository.py(1 hunks)backend/apps/ai/management/commands/ai_update_repository_chunks.py(1 hunks)backend/apps/ai/management/commands/ai_update_repository_context.py(1 hunks)backend/apps/ai/models/chunk.py(1 hunks)backend/apps/common/utils.py(2 hunks)backend/test_commands.py(1 hunks)backend/tests/apps/ai/common/extractors/repository_test.py(1 hunks)backend/tests/apps/ai/common/utils_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py(1 hunks)backend/tests/apps/ai/models/chunk_test.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
- backend/apps/ai/models/chunk.py
- backend/apps/ai/common/constants.py
- backend/apps/ai/management/commands/ai_update_repository_chunks.py
- backend/apps/ai/management/commands/ai_update_repository_context.py
- backend/tests/apps/ai/models/chunk_test.py
- backend/apps/ai/Makefile
- backend/tests/apps/ai/common/utils_test.py
- backend/test_commands.py
- backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/apps/ai/common/extractors/repository.py (2)
backend/apps/common/utils.py (1)
is_valid_json(106-120)backend/apps/github/utils.py (1)
get_repository_file_content(60-79)
backend/apps/ai/common/base/chunk_command.py (3)
backend/apps/common/utils.py (1)
is_valid_json(106-120)backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (1)
extract_content(32-34)backend/tests/apps/ai/common/base/chunk_command_test.py (1)
extract_content(23-25)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (1)
extract_repository_content(14-176)
🪛 GitHub Check: CodeQL
backend/tests/apps/ai/common/extractors/repository_test.py
[failure] 421-421: Incomplete URL substring sanitization
The string raw.githubusercontent.com may be at an arbitrary position in the sanitized URL.
[failure] 574-574: Incomplete URL substring sanitization
The string raw.githubusercontent.com may be at an arbitrary position in the sanitized URL.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run CI Denendencies Scan
- GitHub Check: Run Code Scan
- GitHub Check: CodeQL (python)
- GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (1)
backend/apps/ai/common/base/chunk_command.py (1)
47-55: JSON path drops metadata_content — confirm intent.When content is JSON,
metadata_contentis ignored. If downstream retrieval relies on that context, consider storing it in chunk metadata (not the chunk text) or prefixing the first chunk out-of-band. If this is intentional, no change needed.
arkid15r
left a comment
There was a problem hiding this comment.
A couple of questions regarding tests:
backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
Outdated
Show resolved
Hide resolved
backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
Outdated
Show resolved
Hide resolved
|
ef97206
into
OWASP:feature/nestbot-assistant
* spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Sync www-repopsitories (#2164) * spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Consolidate code commits * Update cspell/custom-dict.txt * Update docker-compose/local.yaml * local yaml worder volume fix * instance check * poetry file updated --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Sync www-repopsitories (#2164) * spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Consolidate code commits * Update cspell/custom-dict.txt * Update docker-compose/local.yaml * local yaml worder volume fix * instance check * poetry file updated --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Sync www-repopsitories (#2164) * spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Consolidate code commits * Update cspell/custom-dict.txt * Update docker-compose/local.yaml * local yaml worder volume fix * instance check * poetry file updated --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Sync www-repopsitories (OWASP#2164) * spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Consolidate code commits * Update cspell/custom-dict.txt * Update docker-compose/local.yaml * local yaml worder volume fix * instance check * poetry file updated --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* add the aria-label * fixed the code style, fixed by the formatter * Handled the undefined repositoryName in aria-label. * fix the white spaces * Add aria-labels to interactive elements for WCAG 2.1 compliance * Sync www-repopsitories (#2164) * spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Update docker-compose/local.yaml * Nestbot MVP (#2113) * Sync www-repopsitories (#2164) * spelling fixes and tests * sonar and code rabbit suggestions implemented * json chunking and suggestions implemented * code rabbit and sonar qube suggestions * code rabbit suggestions * suggestions implemented * github advance security addressed * tests fixed * fixed tests * Clean up backend/test_commands.py --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Consolidate code commits * Update cspell/custom-dict.txt * Update docker-compose/local.yaml * local yaml worder volume fix * instance check * poetry file updated --------- Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * fixed the duplication error for chunks and context (#2343) * Fix slack and duplication errors (#2352) * fix slack and duplication errors * code rabbit suggestions * integrity error solved * using set * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Response improvements and refactoring (#2407) * improvements and refactoring * added prompt checks and tests for it * question detection refining (#2443) * question detection refining * sonar qube fixes * fix tests * Agentic rag (#2432) * agentic rag * spelling fixes * code rabbit and sonar qube suggestions * code rabbit suggestions * refining * fix test * refining * added question detectoor to nestbot mentions (#2473) * Merge NestBot AI Assistant feature branch * Update docker-compose/local.yaml * Update backend/Makefile * fix the white spaces * Add aria-labels to interactive elements for WCAG 2.1 compliance * Fix equality checks with floating point values (#2658) * Edit in test cases for the new aira-label update * More edits to make the test more unbreakable and few regex edits . * Revert one change regards two seperate Contributors * Regex check fix * Global 10-second timeout and increased other mobile timeout to fix the Mobile Safari flaky tests --------- Co-authored-by: Dishant Miyani <dishantmiyani1804@gmail.com> Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com> Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> Co-authored-by: Kate Golovanova <kate@kgthreads.com>


Proposed change
Resolves #2144
www-repositoriesChecklist
make check-testlocally; all checks and tests passed.