Add sync microservice for watching file system events#486
Add sync microservice for watching file system events#486rahulharpal1603 merged 2 commits intoAOSSIE-Org:mainfrom
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds a Sync microservice (FastAPI) with health, watcher, and folder-tagging-status endpoints and a filesystem watcher; supplies configs, schemas, README, and requirements. Updates the primary backend to batch-process images across folder trees, add DB helpers for folders/images, adjust add/sync flows, and initialize tables earlier. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor OS as File System
participant SM as Sync Microservice
participant WL as Watcher Thread
participant DB as SQLite (folders/images)
participant BE as Primary Backend
rect rgb(235, 245, 255)
note over SM: Startup (lifespan)
SM->>DB: db_check_database_connection()
DB-->>SM: ok/err
SM->>SM: start watcher if DB ok
SM->>WL: watcher_util_start_folder_watcher()
WL->>DB: db_get_all_folders_with_ids()
DB-->>WL: [(folder_id, path)...]
end
rect rgb(245, 255, 245)
note over WL: Runtime monitoring
OS-->>WL: change events (created/modified/deleted)
WL->>WL: map changes to watched folder(s)
alt non-delete changes
WL->>BE: POST /folders/sync-folder {folder_id, folder_path}
BE-->>WL: 200/err
else deletions
WL->>BE: DELETE /folders/delete-folders {folder_ids}
BE-->>WL: 200/err
WL->>WL: restart watcher
end
end
rect rgb(255, 245, 235)
note over SM: Health and status APIs
SM->>DB: db_check_database_connection()
SM->>WL: watcher_util_is_watcher_running()
SM-->>Client: Health/Watcher status JSON
end
sequenceDiagram
autonumber
participant API as Primary Backend /folders
participant IF as Image Utils
participant Dbf as DB (folders/images)
participant FS as File System
note over API: Add/Sync Folder
API->>Dbf: Insert/lookup folder(s)
API->>IF: image_util_process_folder_images([(path,id,recursive)...])
loop per folder batch
IF->>FS: scan images (recursive?)
FS-->>IF: file list
IF->>IF: prepare records, generate thumbnails
end
IF->>Dbf: bulk insert/update images
IF->>Dbf: query images by folder_ids
IF->>Dbf: delete obsolete images
IF-->>API: success/failure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Assessment against linked issues
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (7)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/utils/folders.py (1)
13-33: Bug: root_path normalization missing (breaks parent linkage and return value).dirpath is normalized to absolute, but root_path is not. This causes:
- The root dir comparison to fail, preventing parent_folder_id from being applied to the root node.
- A KeyError on return if root_path was provided as a relative path (folder_map stores absolute paths).
Fix by normalizing root_path upfront and using it consistently.
Apply:
-def folder_util_add_folder_tree(root_path, parent_folder_id=None, AI_Tagging=False, taggingCompleted=None): +def folder_util_add_folder_tree(root_path, parent_folder_id=None, AI_Tagging=False, taggingCompleted=None): """ Recursively collect folder data and insert all folders in a single database transaction. All folders are initially inserted with NULL parent_id, which is updated after insertion. Returns the root folder's UUID and the folder map (containing folder_id and parent_id). """ + # Normalize to absolute so keys and comparisons match os.walk outputs + root_path = os.path.abspath(root_path) folders_data = [] folder_map = {} # Maps path to (folder_id, parent_id) for dirpath, dirnames, _ in os.walk(root_path, topdown=True): dirpath = os.path.abspath(dirpath) # Generate a UUID for this folder this_folder_id = str(uuid.uuid4()) # Determine parent ID for the map (not for initial insert) if dirpath == root_path: parent_id = parent_folder_id else: parent_path = os.path.dirname(dirpath) - parent_id = folder_map[parent_path][0] if parent_path in folder_map else None + parent_id = folder_map[parent_path][0] if parent_path in folder_map else None @@ - return folder_map[root_path][0], folder_map + return folder_map[root_path][0], folder_mapbackend/app/database/images.py (1)
235-268: Enable foreign key enforcement to ensure ON DELETE CASCADE effects.SQLite requires PRAGMA foreign_keys = ON per-connection. Without it, deleting images won’t cascade to image_classes, leaving orphan rows. Enable it before the DELETE.
def db_delete_images_by_ids(image_ids: List[ImageId]) -> bool: @@ - conn = sqlite3.connect(DATABASE_PATH) - cursor = conn.cursor() + conn = sqlite3.connect(DATABASE_PATH) + cursor = conn.cursor() + # Ensure cascading deletes for image_classes + cursor.execute("PRAGMA foreign_keys = ON;") @@ - placeholders = ",".join("?" for _ in image_ids) + placeholders = ",".join(["?"] * len(image_ids))Longer-term, consider a small helper to open connections with foreign_keys enabled to avoid repeating this in every write-path.
backend/app/utils/images.py (1)
102-110: Guard against None from classifier to avoid TypeError.ObjectClassifier.get_classes can return None. len(classes) will raise if so.
Apply this minimal fix:
- # Step 2: Insert class-image pairs if classes were detected - if len(classes) > 0: + # Step 2: Insert class-image pairs if classes were detected + if classes: # Create image-class pairs image_class_pairs = [(image_id, class_id) for class_id in classes] print(image_class_pairs) # Insert the pairs into the database db_insert_image_classes_batch(image_class_pairs)
🧹 Nitpick comments (32)
backend/app/utils/folders.py (1)
122-163: API return enrichment is useful; minor naming clarity nit.Returning all (folder_id, folder_path) pairs across added trees is a nice improvement for downstream batch image processing. Consider avoiding shadowing folder_path in the inner loop for readability.
- for folder_path_in_map, (folder_id_in_map, _) in folder_map.items(): - added_folders.append((folder_id_in_map, folder_path_in_map)) + for path_in_map, (id_in_map, _) in folder_map.items(): + added_folders.append((id_in_map, path_in_map))Also consider replacing print with a structured logger for uniform diagnostics across the service.
backend/app/database/folders.py (1)
333-369: Batch path lookup reads well; minor placeholder readability.The IN-clause placeholder construction works but is slightly opaque. Consider the more idiomatic list-multiply or generator form for clarity.
- placeholders = ",".join("?" * len(abs_paths)) + placeholders = ",".join(["?"] * len(abs_paths))Otherwise, normalization to absolute paths and mapping output are spot on.
backend/app/utils/images.py (3)
115-116: Don’t mark images as tagged when classification/detection fails.If classes is None (failed load) or db insert/detection errors occur, you still set isTagged = True. That hides failures and prevents retries.
Suggested adjustment:
- Update status only when at least one of: classes inserted successfully or face detection completes, or explicitly track a failed state to allow retry.
193-208: Ensure thumbnail directory exists within the generator (safer for reuse).image_util_generate_thumbnail assumes THUMBNAIL_IMAGES_PATH exists (created by the caller). Make it robust if called elsewhere.
with Image.open(image_path) as img: img.thumbnail(size) @@ - img.save(thumbnail_path, "JPEG") # Always save thumbnails as JPEG + # Ensure destination directory exists + os.makedirs(os.path.dirname(thumbnail_path), exist_ok=True) + img.save(thumbnail_path, "JPEG") # Always save thumbnails as JPEG
55-57: Replace print statements with structured logging.Use a shared logger for visibility and to integrate with the service’s logging pipeline. This also allows toggling verbosity in production vs. dev.
Also applies to: 186-188, 206-207
sync-microservice/app/schemas/__init__.py (1)
1-1: Consider re-exporting commonly used schemas from the package init for ergonomic imports.This allows
from app.schemas import HealthCheckResponse, ...and centralizes the public API surface.Apply this diff to re-export the main schemas:
-"""Schemas package for sync-microservice API responses and requests.""" +"""Schemas package for sync-microservice API responses and requests.""" + +# Re-export commonly used schemas for convenient imports: +from .health import HealthCheckResponse +from .folders import ( + FolderTaggingInfo, + FolderTaggingStatusSuccessResponse, + FolderTaggingStatusErrorResponse, +) +from .watcher import ( + WatchedFolder, + WatcherStatusResponse, + WatcherControlResponse, + WatcherErrorResponse, +) + +__all__ = [ + "HealthCheckResponse", + "FolderTaggingInfo", + "FolderTaggingStatusSuccessResponse", + "FolderTaggingStatusErrorResponse", + "WatchedFolder", + "WatcherStatusResponse", + "WatcherControlResponse", + "WatcherErrorResponse", +]sync-microservice/requirements.txt (3)
40-40: Add a trailing newline at EOF.Minor formatting nit that some tooling enforces.
- websockets==15.0.1 + websockets==15.0.1 +
10-13: Option: consolidate uvicorn extras to simplify maintenance.Using uvicorn’s standard extras avoids separately pinning httptools/uvloop/websockets and reduces drift.
-httptools==0.6.4 -uvicorn==0.35.0 -uvloop==0.21.0 -websockets==15.0.1 +uvicorn[standard]==0.35.0Note:
uvicorn[standard]already depends on httptools, uvloop, websockets, and watchfiles; ensure version compatibility with your current pins if you adopt this.Also applies to: 37-39
1-40: Split runtime vs. dev dependencies & consolidate Uvicorn extrasThe import scan under sync-microservice shows these packages aren’t directly imported and can be moved into a dev-only requirements file (e.g.,
dev-requirements.txt):• fastapi-cli
• fastapi-cloud-cli
• Jinja2
• markdown-it-py
• mdurl
• Pygments
• rich
• rich-toolkit
• rignore
• typer
• python-dotenv
• python-multipart
• PyYAML
• email_validator
• sentry-sdkRecommendations:
In
requirements.txt, keep only the direct runtime dependencies (e.g., fastapi, httpx, watchfiles).Consolidate Uvicorn extras by replacing
httptools,uvloop,websocketswith:uvicorn[standard]==<version>Pin transitive dependencies (anyio, h11, httpcore, idna, urllib3, etc.) via a constraints or lock file, rather than listing them in the main reqs.
Maintain a separate dev requirements file for linters, docs, CLI tools, and test utilities.
sync-microservice/.gitignore (2)
1-1: Top-level ignore for all .db files is broad—confirm this won’t hide checked-in fixtures.
**/*.dbwill also ignore any test fixtures or sample DBs you might want to commit inside this subtree. If that’s intentional, all good; otherwise, scope it (e.g., to a runtime dir).Would you like a targeted pattern set (e.g., ignore only the runtime SQLite path configured in settings)?
137-142: Verify ignoring images/ is desired for this package.Ignoring
images/at the microservice root could hide docs/assets placed here later. If this directory is only for runtime cache/temp content, consider renaming it (e.g.,.runtime-images/) and keeping the ignore specific.sync-microservice/app/schemas/health.py (1)
5-10: Consider Enums for repeated status strings to avoid typos across modules.Using small
Enumclasses (e.g., ServiceStatus, DatabaseStatus, WatcherStatus) centralizes allowed values and makes reuse in other schemas/routes safer.If you want, I can propose a small
enums.pywith these definitions and wire this schema to use them.sync-microservice/app/schemas/folders.py (2)
21-36: Schemas look good; consider minor refinements.
- You can use
NonNegativeIntfortotal_foldersfor readability (currentge=0is fine).- If you plan JSON examples, add
model_config = ConfigDict(json_schema_extra={"examples": [...]})to improve OpenAPI docs.I can add examples if you share sample payloads from the
/folders/statusroute.
5-16: Rename PydanticFolderTaggingInfoto avoid naming collision
- Definitions found:
sync-microservice/app/database/folders.py:11–class FolderTaggingInfo(NamedTuple)sync-microservice/app/schemas/folders.py:5–class FolderTaggingInfo(BaseModel)- No direct imports of
FolderTaggingInfofrom either module were detected, but sharing the same class name across layers can still lead to subtle bugs if the wrong one is imported.Two safe options:
- Rename the schema class to
FolderTaggingInfoModel(or similar) and re-export it asFolderTaggingInfofromapp.schemasfor a consistent public API.- Keep the existing name but enforce fully-qualified imports (
db.FolderTaggingInfovs.schemas.FolderTaggingInfo) and add a lint rule or static type check to catch ambiguous references.sync-microservice/README.md (2)
97-104: Document a robust database path configuration using an environment variable.Running from a different working directory will break relative paths. Recommend documenting an env override and path resolution relative to the repo root.
Apply this diff:
-```python -DATABASE_PATH = "../backend/app/database/PictoPy.db" -``` +```python +import os +from pathlib import Path + +# Resolve to monorepo root and default DB path +REPO_ROOT = Path(__file__).resolve().parents[3] +DATABASE_PATH = os.getenv( + "PICTOPY_DB_PATH", + str(REPO_ROOT / "backend/app/database/PictoPy.db"), +) +```
7-12: Minor copy polish (articles) for better readability.Examples:
- “Watches all folders registered in the PictoPy database”
- “Connects to the main PictoPy database to get folder information”
- “Provides health check endpoints for monitoring”
sync-microservice/app/config/settings.py (2)
1-5: Allow URL overrides via environment variables for deployment flexibility.Make PRIMARY_BACKEND_URL and SYNC_MICROSERVICE_URL configurable to support non-localhost deployments.
Apply this diff:
-# Model Exports Path -MODEL_EXPORTS_PATH = "app/models/ONNX_Exports" -PRIMARY_BACKEND_URL = "http://localhost:8000" -SYNC_MICROSERVICE_URL = "http://localhost:8001" +# Model Exports Path +MODEL_EXPORTS_PATH = "app/models/ONNX_Exports" +import os +PRIMARY_BACKEND_URL = os.getenv("PICTOPY_PRIMARY_URL", "http://localhost:8000") +SYNC_MICROSERVICE_URL = os.getenv("PICTOPY_SYNC_URL", "http://localhost:8001")
23-25: Consider normalizing the image paths relative to a base directory.THUMBNAIL_IMAGES_PATH and IMAGES_PATH are CWD-relative and may drift. Consider deriving them from a BASE_DATA_DIR or env var similar to DATABASE_PATH.
sync-microservice/main.py (1)
13-16: Optionally add a root (/) info endpoint or update README to remove it.To match the README’s “GET /” section, expose a simple service info route; otherwise, remove that doc entry.
Apply this diff:
app.include_router(health.router, prefix="/api/v1") app.include_router(watcher.router, prefix="/api/v1") app.include_router(folders.router, prefix="/api/v1") + +# Optional: root info endpoint +@app.get("/") +def root(): + return { + "name": "PictoPy Sync Microservice", + "version": app.version, + "docs_url": "/docs", + "openapi_url": "/openapi.json", + "api_base": "/api/v1", + }sync-microservice/app/routes/health.py (1)
9-17: Return HTTP 503 when unhealthy (without changing the response shape).Set the response status code to 503 when the DB check fails, keeping the HealthCheckResponse payload intact.
Apply this diff:
-from fastapi import APIRouter +from fastapi import APIRouter, Response @router.get("/health", response_model=HealthCheckResponse) -async def health_check(): +async def health_check(response: Response): """Health check endpoint.""" db_status = db_check_database_connection() - return HealthCheckResponse( + if not db_status: + response.status_code = 503 + return HealthCheckResponse( status="healthy" if db_status else "unhealthy", database="connected" if db_status else "disconnected", watcher="running" if watcher_util_is_watcher_running() else "stopped", )sync-microservice/app/routes/folders.py (3)
13-20: Use 500 status code on errors while preserving the error envelope.Currently both success and error return 200. Set the HTTP status code to 500 on exception to aid monitoring/alerting without changing the response model.
Apply this diff:
-from fastapi import APIRouter +from fastapi import APIRouter, Response @@ -@router.get( +@router.get( @@ -def get_folders_tagging_status(): +def get_folders_tagging_status(response: Response):And in the except block:
- except Exception as e: - return FolderTaggingStatusErrorResponse( + except Exception as e: + response.status_code = 500 + return FolderTaggingStatusErrorResponse( status="error", message=f"Failed to retrieve tagging progress: {str(e)}", )
23-28: Docstring: clarify the response envelope shape.The endpoint returns a success envelope with data and total_folders (not a bare list).
Apply this diff:
- Returns: - List of folders with their tagging progress information including: - - folder_id: Unique identifier for the folder - - folder_path: Path to the folder - - tagging_percentage: Percentage of images that have been tagged (0-100) + Returns: + On success: + - status: "success" + - total_folders: count of folders + - data: List of folders with their tagging progress: + - folder_id: Unique identifier for the folder + - folder_path: Path to the folder + - tagging_percentage: Percentage of images tagged (0-100) + + On error: + - status: "error" + - message: Error details + - total_folders: 0 + - data: []
32-39: Optional: return type hint and result ordering for predictable output.
- Consider adding a return type hint: Union[FolderTaggingStatusSuccessResponse, FolderTaggingStatusErrorResponse].
- Optionally sort by folder_path or folder_id for stable responses.
sync-microservice/app/core/lifespan.py (2)
11-13: Stop the watcher unconditionally on shutdown; remove the fragile watcher_started global.If the watcher is started later via the /watcher/start API, watcher_started will remain False and the shutdown block won't stop the watcher. Stopping is idempotent and already guarded in watcher_util_stop_folder_watcher, so it's safe to call it unconditionally and drop the global flag.
Apply this diff:
-# Global variable to track watcher status -watcher_started = False - @@ - global watcher_started + # no module-level state needed @@ - watcher_started = watcher_util_start_folder_watcher() + watcher_util_start_folder_watcher() @@ - if watcher_started: - watcher_util_stop_folder_watcher() - print("Folder watcher stopped") + watcher_util_stop_folder_watcher() + print("Folder watcher stopped")Also applies to: 20-21, 36-37, 51-54
24-47: Prefer structured logging over print for observability.Replacing print with logging.getLogger(name).info()/error() provides levels, timestamps, and integration with FastAPI/Uvicorn logs. This will help trace issues in production.
sync-microservice/app/routes/watcher.py (1)
64-69: Consider returning HTTP 409 when the watcher is already running.Semantic improvement: returning a 409 Conflict for “already running” communicates intent better than a 200 with success=False. Not mandatory, just a UX/API polish.
sync-microservice/app/database/folders.py (1)
6-9: Type alias nit: use FolderPath for readability.Minor consistency fix to use your FolderPath alias in FolderIdPath.
-FolderIdPath = Tuple[FolderId, str] +FolderIdPath = Tuple[FolderId, FolderPath]sync-microservice/app/utils/watcher.py (5)
47-58: Deduplicate deleted folder IDs and avoid repeated API calls.Use a set to aggregate deleted IDs and cast to list when calling the API.
Apply this diff:
- deleted_folder_ids = [] + deleted_folder_ids = set() @@ - deleted_folder_ids.append(deleted_folder_id) + deleted_folder_ids.add(deleted_folder_id) @@ - if deleted_folder_ids: - print(f"Calling delete API for {len(deleted_folder_ids)} deleted folders") - watcher_util_call_delete_folders_api(deleted_folder_ids) + if deleted_folder_ids: + print(f"Calling delete API for {len(deleted_folder_ids)} deleted folders") + watcher_util_call_delete_folders_api(list(deleted_folder_ids)) watcher_util_restart_folder_watcher()Also applies to: 73-77
110-111: Remove debug print.This noisy debug line will spam logs on every change.
- print("best match: ", best_match)
98-106: Make parent-folder detection OS-robust.Using startswith can be brittle across case-insensitive filesystems and symlinks. Consider os.path.commonpath or pathlib.Path.resolve() + is_relative_to (Python 3.9+: emulate with try/except).
Example refactor:
from pathlib import Path file_path = Path(file_path).resolve() for folder_id, folder_path in watched_folders: folder_resolved = Path(folder_path).resolve() try: file_path.relative_to(folder_resolved) # folder_resolved is a parent ... except ValueError: continue
179-183: Confirm non-recursive watching is intentional.recursive=False means changes in subdirectories won’t be observed. Given the goal to sync folder and image data, you might want recursive=True, otherwise nested albums won’t trigger syncs.
Would you like me to open a follow-up patch to toggle this (and document the behavior)?
223-271: Minor: folder_id_map is populated but unused.You build folder_id_map but don’t use it anywhere. Either remove it for clarity or leverage it in watcher_util_get_folder_id_if_watched for O(1) lookups.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (21)
backend/app/database/folders.py(6 hunks)backend/app/database/images.py(1 hunks)backend/app/routes/folders.py(7 hunks)backend/app/utils/folders.py(7 hunks)backend/app/utils/images.py(3 hunks)backend/main.py(4 hunks)sync-microservice/.gitignore(1 hunks)sync-microservice/README.md(1 hunks)sync-microservice/app/config/settings.py(1 hunks)sync-microservice/app/core/lifespan.py(1 hunks)sync-microservice/app/database/folders.py(1 hunks)sync-microservice/app/routes/folders.py(1 hunks)sync-microservice/app/routes/health.py(1 hunks)sync-microservice/app/routes/watcher.py(1 hunks)sync-microservice/app/schemas/__init__.py(1 hunks)sync-microservice/app/schemas/folders.py(1 hunks)sync-microservice/app/schemas/health.py(1 hunks)sync-microservice/app/schemas/watcher.py(1 hunks)sync-microservice/app/utils/watcher.py(1 hunks)sync-microservice/main.py(1 hunks)sync-microservice/requirements.txt(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (11)
sync-microservice/app/routes/health.py (3)
sync-microservice/app/database/folders.py (1)
db_check_database_connection(44-68)sync-microservice/app/utils/watcher.py (1)
watcher_util_is_watcher_running(211-213)sync-microservice/app/schemas/health.py (1)
HealthCheckResponse(5-10)
sync-microservice/main.py (2)
sync-microservice/app/core/lifespan.py (1)
lifespan(16-54)backend/main.py (1)
lifespan(32-49)
sync-microservice/app/schemas/folders.py (1)
sync-microservice/app/database/folders.py (1)
FolderTaggingInfo(11-16)
backend/main.py (2)
backend/app/database/folders.py (1)
db_create_folders_table(15-32)backend/app/database/images.py (1)
db_create_images_table(31-64)
sync-microservice/app/core/lifespan.py (3)
sync-microservice/app/database/folders.py (1)
db_check_database_connection(44-68)sync-microservice/app/utils/watcher.py (2)
watcher_util_start_folder_watcher(216-271)watcher_util_stop_folder_watcher(274-302)backend/main.py (1)
lifespan(32-49)
sync-microservice/app/routes/watcher.py (2)
sync-microservice/app/utils/watcher.py (5)
watcher_util_start_folder_watcher(216-271)watcher_util_stop_folder_watcher(274-302)watcher_util_restart_folder_watcher(305-314)watcher_util_is_watcher_running(211-213)watcher_util_get_watcher_info(317-328)sync-microservice/app/schemas/watcher.py (2)
WatcherStatusResponse(12-19)WatcherControlResponse(22-27)
sync-microservice/app/utils/watcher.py (1)
sync-microservice/app/database/folders.py (1)
db_get_all_folders_with_ids(19-41)
sync-microservice/app/routes/folders.py (2)
sync-microservice/app/schemas/folders.py (3)
FolderTaggingStatusSuccessResponse(18-23)FolderTaggingStatusErrorResponse(26-36)FolderTaggingInfo(5-15)sync-microservice/app/database/folders.py (1)
FolderTaggingInfo(11-16)
sync-microservice/app/database/folders.py (1)
sync-microservice/app/schemas/folders.py (1)
FolderTaggingInfo(5-15)
backend/app/routes/folders.py (1)
backend/app/utils/images.py (2)
image_util_process_folder_images(20-70)image_util_process_untagged_images(73-87)
backend/app/utils/images.py (3)
backend/app/database/images.py (5)
db_get_images_by_folder_ids(200-232)db_delete_images_by_ids(235-267)db_bulk_insert_images(67-90)db_get_untagged_images(93-134)db_update_image_tagged_status(137-163)backend/app/models/FaceDetector.py (3)
FaceDetector(11-66)detect_faces(22-54)close(56-66)backend/app/models/ObjectClassifier.py (2)
ObjectClassifier(6-30)close(24-30)
🪛 LanguageTool
sync-microservice/README.md
[grammar] ~7-~7: There might be a mistake here.
Context: ...ring**: Watches all folders registered in PictoPy database - 🗄️ **Database Integr...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...olders registered in PictoPy database - 🗄️ Database Integration: Connects to ...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...?️ Database Integration: Connects to main PictoPy database to get folder infor...
(QB_NEW_EN)
[grammar] ~9-~9: There might be a mistake here.
Context: ... database to get folder information - 📊 Health Monitoring: Provides health c...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ...tarts watching folders on startup ## Quick Start ### Prerequisites - Python 3.8...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ... Start ### Prerequisites - Python 3.8+ - PictoPy backend database should be set u...
(QB_NEW_EN)
[grammar] ~29-~29: There might be a mistake here.
Context: ...nc-microservice 2. **Create and activate virtual environment:** bash p...
(QB_NEW_EN)
[grammar] ~111-~111: There might be a mistake here.
Context: ...lly: 1. On Startup: - Connects to PictoPy database - Retrieves all fol...
(QB_NEW_EN)
[grammar] ~118-~118: There might be a mistake here.
Context: ...itors file changes (add, modify, delete) - Logs all detected changes - Maps chan...
(QB_NEW_EN)
[grammar] ~119-~119: There might be a mistake here.
Context: ..., delete) - Logs all detected changes - Maps changes to specific folder IDs -...
(QB_NEW_EN)
[grammar] ~120-~120: There might be a mistake here.
Context: ... - Maps changes to specific folder IDs - Prepares for future database sync operat...
(QB_NEW_EN)
[grammar] ~123-~123: There might be a mistake here.
Context: ...abase sync operations 3. On Shutdown: - Gracefully stops file watcher - Clean...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...base Operations**: Add new functions to app/database/folders.py 2. File Processing: Extend `app/utils/fil...
(QB_NEW_EN)
[grammar] ~132-~132: There might be a mistake here.
Context: ...ders.py2. **File Processing**: Extendapp/utils/file_watcher.py3. **API Endpoints**: Add routes tomain.py`...
(QB_NEW_EN)
[grammar] ~139-~139: There might be a mistake here.
Context: ...mage file detection and database updates - Thumbnail generation for new images - AI...
(QB_NEW_EN)
[grammar] ~140-~140: There might be a mistake here.
Context: ...es - Thumbnail generation for new images - AI processing trigger for new images - D...
(QB_NEW_EN)
[grammar] ~141-~141: There might be a mistake here.
Context: ...s - AI processing trigger for new images - Database cleanup for deleted files - Int...
(QB_NEW_EN)
[grammar] ~142-~142: There might be a mistake here.
Context: ...ges - Database cleanup for deleted files - Integration with main PictoPy backend AP...
(QB_NEW_EN)
[grammar] ~165-~165: There might be a mistake here.
Context: ...No Folders to Watch** - Add folders to PictoPy backend first - Check databa...
(QB_NEW_EN)
[grammar] ~169-~169: There might be a mistake here.
Context: ...in filesystem 3. Watcher Not Starting - Check folder permissions - Ensure fol...
(QB_NEW_EN)
[grammar] ~171-~171: There might be a mistake here.
Context: ...r permissions - Ensure folders exist on filesystem - Review logs for specifi...
(QB_NEW_EN)
[grammar] ~178-~178: There might be a mistake here.
Context: ...d console output for: - Startup process - Database connections - Folder discovery ...
(QB_NEW_EN)
[grammar] ~179-~179: There might be a mistake here.
Context: ...- Startup process - Database connections - Folder discovery - File change events - ...
(QB_NEW_EN)
[grammar] ~180-~180: There might be a mistake here.
Context: ... Database connections - Folder discovery - File change events - Error conditions #...
(QB_NEW_EN)
[grammar] ~181-~181: There might be a mistake here.
Context: ... - Folder discovery - File change events - Error conditions ## Integration This m...
(QB_NEW_EN)
[grammar] ~188-~188: There might be a mistake here.
Context: ...: - Port 8000: Main PictoPy backend - Port 8001: Sync microservice (this ser...
(QB_NEW_EN)
[grammar] ~189-~189: There might be a mistake here.
Context: ...8001**: Sync microservice (this service) - Shared Database: Both services use the...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
sync-microservice/README.md
82-82: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Tauri Tests
- GitHub Check: Backend Tests
🔇 Additional comments (19)
backend/main.py (3)
124-126: Router inclusion LGTM.New routers are mounted with clear prefixes and tags.
95-101: Docs path verified—no action needed.The directory
docs/backend/backend_python/and the fileopenapi.jsonalready exist in the repo, so writing to that location at runtime will succeed without breaking CI or documentation publishing.
31-45: Table creation order in backend/main.py verified safeVerified that in the
lifespanfunction the calls occur in this sequence:
db_create_folders_table()db_create_images_table()db_create_YOLO_classes_table()- …followed by all other table setups.
Because
db_create_YOLO_classes_tableruns immediately afterdb_create_images_tableand before any writes to theimage_classesmapping, there are no foreign-key or startup-order issues. No changes needed.backend/app/utils/folders.py (1)
98-120: Deletion helper reads cleanly; edge cases covered.Early return on empty input and ID derivation from db_child_folders look correct. Good use of batch delete.
backend/app/database/folders.py (2)
183-194: Foreign keys pragma on delete is correct.Enabling PRAGMA foreign_keys before deleting folders ensures cascading to images and downstream tables. Good.
237-241: Tiny readability improvement acknowledged.Inlining the SELECT for existence check is fine; behavior unchanged.
backend/app/routes/folders.py (3)
168-176: Background submit is correct; signature alignment with type hints above.Submitting post processing to the ProcessPool keeps the request fast. Ensure the function signature matches any type hint changes you apply.
325-348: Sync flow is solid; confirms add/delete, then batches post-sync processing.
- Comparing filesystem vs DB using sets is clean and efficient.
- Using batch helpers for add/delete aligns with the new multi-folder image pipeline.
- Scheduling post_sync_folder_sequence is appropriate.
No functional issues spotted here.
121-123: Validation error path normalization is correct.Raising a clear validation error early is good. Since later logic assumes absolute paths, you already normalize request.folder_path below (Line 139).
backend/app/utils/images.py (3)
171-190: LGTM: folder traversal and filtering are clear.Recursive and non-recursive modes are implemented cleanly with a validity filter. Error handling on os.listdir is appropriate.
220-238: LGTM: Obsolete image cleanup is correct and removes stale thumbnails too.The function scopes deletions to provided folder IDs, and handles thumbnail cleanup with appropriate error handling.
141-156: Review Image ID Generation & Schema ConstraintsI couldn’t locate the
imagestable definition (raw SQL or ORM model) in the repo, so please confirm your DB schema to avoid duplicate rows or type mismatches:• Verify
images.idcolumn type (UUID/TEXT vs. INTEGER) matches downstream expectations (e.g.,FaceDetector.detect_faces).
• Ensure there’s a UNIQUE constraint or index onimages.pathsoINSERT OR IGNOREactually dedupes.If
pathisn’t unique yet, choose one of:
- Add a UNIQUE constraint on
images.pathand continue using random UUIDs.- Switch to a deterministic UUID (e.g.
uuid5on the absolute file path) so IDs remain stable across runs.Please confirm via your migration files or ORM models that these constraints and types are correctly defined.
sync-microservice/app/schemas/__init__.py (1)
1-1: Package docstring is fine.Clear and helpful description for the schemas package.
sync-microservice/app/schemas/health.py (1)
5-10: LGTM: clear, minimal health schema with Literals.The field set matches the described health endpoint and should generate a tight OpenAPI schema.
sync-microservice/app/schemas/folders.py (1)
8-9: No action needed:folder_idtype matches the DB layerThe DB alias
FolderIdis defined asstrinsync-microservice/app/database/folders.py:6, so the Pydantic schema’sfolder_id: stris correct.sync-microservice/main.py (1)
5-16: Router wiring with lifespan looks good.sync-microservice/app/routes/health.py (1)
1-17: Health endpoint implementation is clean and aligned with the schema.sync-microservice/app/routes/folders.py (1)
33-37: folder_id is consistently a string in DB and schemaI confirmed that:
- backend/app/database/folders.py declares folder_id as TEXT (primary key)
- sync-microservice/app/database/folders.py’s FolderTaggingInfo NamedTuple uses folder_id: str
- sync-microservice/app/schemas/folders.py’s Pydantic model declares folder_id: str
All layers in this service treat folder_id as a string. Please verify that the main backend’s external API also exposes folder IDs as strings—if it returns integers, consider aligning both services to use the same type.
sync-microservice/app/schemas/watcher.py (1)
5-34: Schemas look solid and consistent.The models align with the utilities’ payloads and should serialize/validate cleanly.
backend/app/database/images.py
Outdated
| def db_get_images_by_folder_ids(folder_ids: List[int]) -> List[Tuple[ImageId, ImagePath, str]]: | ||
| """ | ||
| Get all images that belong to the specified folder IDs. | ||
|
|
||
| Args: | ||
| folder_ids: List of folder IDs to search for images | ||
|
|
||
| Returns: | ||
| List of tuples containing (image_id, image_path, thumbnail_path) | ||
| """ | ||
| if not folder_ids: | ||
| return [] | ||
|
|
||
| conn = sqlite3.connect(DATABASE_PATH) | ||
| cursor = conn.cursor() | ||
|
|
||
| try: | ||
| # Create placeholders for the IN clause | ||
| placeholders = ",".join("?" for _ in folder_ids) | ||
| cursor.execute( | ||
| f""" | ||
| SELECT id, path, thumbnailPath | ||
| FROM images | ||
| WHERE folder_id IN ({placeholders}) | ||
| """, | ||
| folder_ids, | ||
| ) | ||
| return cursor.fetchall() | ||
| except Exception as e: | ||
| print(f"Error getting images by folder IDs: {e}") | ||
| return [] | ||
| finally: | ||
| conn.close() | ||
|
|
There was a problem hiding this comment.
Type mismatch: folder_ids should be List[FolderId] (str), not List[int].
folders.folder_id is TEXT (UUID). Using List[int] misleads type checkers and readers, and increases risk of incorrect inputs.
-def db_get_images_by_folder_ids(folder_ids: List[int]) -> List[Tuple[ImageId, ImagePath, str]]:
+def db_get_images_by_folder_ids(folder_ids: List[FolderId]) -> List[Tuple[ImageId, ImagePath, str]]:Optionally align placeholder creation style with the rest of the codebase for consistency:
- placeholders = ",".join("?" for _ in folder_ids)
+ placeholders = ",".join(["?"] * len(folder_ids))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def db_get_images_by_folder_ids(folder_ids: List[int]) -> List[Tuple[ImageId, ImagePath, str]]: | |
| """ | |
| Get all images that belong to the specified folder IDs. | |
| Args: | |
| folder_ids: List of folder IDs to search for images | |
| Returns: | |
| List of tuples containing (image_id, image_path, thumbnail_path) | |
| """ | |
| if not folder_ids: | |
| return [] | |
| conn = sqlite3.connect(DATABASE_PATH) | |
| cursor = conn.cursor() | |
| try: | |
| # Create placeholders for the IN clause | |
| placeholders = ",".join("?" for _ in folder_ids) | |
| cursor.execute( | |
| f""" | |
| SELECT id, path, thumbnailPath | |
| FROM images | |
| WHERE folder_id IN ({placeholders}) | |
| """, | |
| folder_ids, | |
| ) | |
| return cursor.fetchall() | |
| except Exception as e: | |
| print(f"Error getting images by folder IDs: {e}") | |
| return [] | |
| finally: | |
| conn.close() | |
| def db_get_images_by_folder_ids(folder_ids: List[FolderId]) -> List[Tuple[ImageId, ImagePath, str]]: | |
| """ | |
| Get all images that belong to the specified folder IDs. | |
| Args: | |
| folder_ids: List of folder IDs to search for images | |
| Returns: | |
| List of tuples containing (image_id, image_path, thumbnail_path) | |
| """ | |
| if not folder_ids: | |
| return [] | |
| conn = sqlite3.connect(DATABASE_PATH) | |
| cursor = conn.cursor() | |
| try: | |
| # Create placeholders for the IN clause | |
| placeholders = ",".join(["?"] * len(folder_ids)) | |
| cursor.execute( | |
| f""" | |
| SELECT id, path, thumbnailPath | |
| FROM images | |
| WHERE folder_id IN ({placeholders}) | |
| """, | |
| folder_ids, | |
| ) | |
| return cursor.fetchall() | |
| except Exception as e: | |
| print(f"Error getting images by folder IDs: {e}") | |
| return [] | |
| finally: | |
| conn.close() |
🤖 Prompt for AI Agents
In backend/app/database/images.py around lines 200 to 233, the function
signature and typing use List[int] for folder_ids while folder_id is stored as
TEXT/UUID; change the type annotation to List[FolderId] (alias of str) and
update any local typing imports accordingly so type checkers reflect string
UUIDs; ensure callers pass strings (or convert ints to str) and keep the SQL
logic the same (placeholders can remain generated by ",".join("?" for _ in
folder_ids) or be aligned to your project's placeholder helper for consistency).
| def post_folder_add_sequence(folder_path: str, folder_id: int): | ||
| """ | ||
| Post-addition sequence for a folder. | ||
| This function is called after a folder is successfully added. | ||
| It processes images in the folder and updates the database. | ||
| """ | ||
| try: | ||
| # Process images in the folder | ||
| image_util_process_folder_images(folder_path) | ||
| # Get all folder IDs and paths that match the root path prefix | ||
| folder_data = [] | ||
| folder_ids_and_paths = db_get_folder_ids_by_path_prefix(folder_path) | ||
|
|
||
| # Set all folders to non-recursive (False) | ||
| for folder_id_from_db, folder_path_from_db in folder_ids_and_paths: | ||
| folder_data.append((folder_path_from_db, folder_id_from_db, False)) | ||
|
|
||
| print("Add folder: ", folder_data) | ||
| # Process images in all folders | ||
| image_util_process_folder_images(folder_data) | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Type hint fix: folder_id should be str (UUID), not int. Also note: unused param.
DB uses TEXT UUIDs for folder_id. Update the annotation. The folder_id param isn’t used in this function; either remove it or use it for clarity.
-def post_folder_add_sequence(folder_path: str, folder_id: int):
+def post_folder_add_sequence(folder_path: str, folder_id: str):
@@
- # Get all folder IDs and paths that match the root path prefix
+ # Get all folder IDs and paths that match the root path prefixIf you decide to drop the unused param, update the submit call in add_folder accordingly.
🏁 Script executed:
#!/bin/bash
# Find all references to post_folder_add_sequence to update call sites if signature changes
rg -nP '\bpost_folder_add_sequence\s*\(' -C2Length of output: 326
🏁 Script executed:
#!/bin/bash
# Locate the add_folder route to see if post_folder_add_sequence is ever invoked there
rg -P -n -C5 'def\s+add_folder' backend/app/routes/folders.pyLength of output: 519
Update post_folder_add_sequence signature: drop unused folder_id or correct its type
-
The
folder_idparameter is never used inpost_folder_add_sequenceand is mis-typed asint(our UUIDs are stored asstr). -
In backend/app/routes/folders.py (around line 42), update the signature:
-def post_folder_add_sequence(folder_path: str, folder_id: int): +def post_folder_add_sequence(folder_path: str):
If you actually need the folder’s UUID inside this function, keep the parameter but change its annotation:
-def post_folder_add_sequence(folder_path: str, folder_id: int): +def post_folder_add_sequence(folder_path: str, folder_id: str):
-
Remove or update any downstream calls to match the new signature. (A global search shows no existing invocations in the codebase.)
🤖 Prompt for AI Agents
In backend/app/routes/folders.py around lines 42 to 60, the function
post_folder_add_sequence currently declares a folder_id: int parameter that is
unused and incorrectly typed (UUIDs are strings); either remove the folder_id
parameter from the signature and any callers, or if the UUID is needed keep the
parameter but change its type annotation to str and use it appropriately; then
run a global search to update/remove any downstream calls to match the new
signature and adjust typing/imports as needed.
backend/app/routes/folders.py
Outdated
| def post_sync_folder_sequence(folder_path: str, folder_id: int, added_folders: List[Tuple[str, str]]): | ||
| """ | ||
| Post-sync sequence for a folder. | ||
| This function is called after a folder is synced. | ||
| It processes images in the folder and updates the database. | ||
| """ | ||
| try: | ||
| # Create folder data array | ||
| folder_data = [] | ||
|
|
||
| folder_data.append((folder_path, folder_id, False)) | ||
|
|
||
| for added_folder_id, added_folder_path in added_folders: | ||
| folder_data.append((added_folder_path, added_folder_id, False)) | ||
|
|
||
| print("Sync folder: ", folder_data) | ||
| # Process images in all folders | ||
| image_util_process_folder_images(folder_data) | ||
| image_util_process_untagged_images() | ||
| cluster_util_face_clusters_sync() | ||
| except Exception as e: | ||
| print(f"Error in post processing after folder {folder_path} was synced: {e}") | ||
| return False | ||
| return True | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Type hints: use str for folder_id and ensure added_folders typing matches.
Consistently use str for folder IDs. The added_folders typing already looks correct as List[Tuple[str, str]].
-def post_sync_folder_sequence(folder_path: str, folder_id: int, added_folders: List[Tuple[str, str]]):
+def post_sync_folder_sequence(folder_path: str, folder_id: str, added_folders: List[Tuple[str, str]]):Also consider replacing prints with the app logger for background task observability.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def post_sync_folder_sequence(folder_path: str, folder_id: int, added_folders: List[Tuple[str, str]]): | |
| """ | |
| Post-sync sequence for a folder. | |
| This function is called after a folder is synced. | |
| It processes images in the folder and updates the database. | |
| """ | |
| try: | |
| # Create folder data array | |
| folder_data = [] | |
| folder_data.append((folder_path, folder_id, False)) | |
| for added_folder_id, added_folder_path in added_folders: | |
| folder_data.append((added_folder_path, added_folder_id, False)) | |
| print("Sync folder: ", folder_data) | |
| # Process images in all folders | |
| image_util_process_folder_images(folder_data) | |
| image_util_process_untagged_images() | |
| cluster_util_face_clusters_sync() | |
| except Exception as e: | |
| print(f"Error in post processing after folder {folder_path} was synced: {e}") | |
| return False | |
| return True | |
| def post_sync_folder_sequence(folder_path: str, folder_id: str, added_folders: List[Tuple[str, str]]): | |
| """ | |
| Post-sync sequence for a folder. | |
| This function is called after a folder is synced. | |
| It processes images in the folder and updates the database. | |
| """ | |
| try: | |
| # Create folder data array | |
| folder_data = [] | |
| folder_data.append((folder_path, folder_id, False)) | |
| for added_folder_id, added_folder_path in added_folders: | |
| folder_data.append((added_folder_path, added_folder_id, False)) | |
| print("Sync folder: ", folder_data) | |
| # Process images in all folders | |
| image_util_process_folder_images(folder_data) | |
| image_util_process_untagged_images() | |
| cluster_util_face_clusters_sync() | |
| except Exception as e: | |
| print(f"Error in post processing after folder {folder_path} was synced: {e}") | |
| return False | |
| return True |
🤖 Prompt for AI Agents
In backend/app/routes/folders.py around lines 82 to 106, the function signature
and runtime usage should use str for folder IDs (change folder_id: int to
folder_id: str) and ensure the added_folders entries are treated as
List[Tuple[str, str]] (they already are in typing but use those types when
constructing folder_data); replace plain print calls with the application logger
(e.g., app.logger.debug/info) and replace the generic print in the except block
with logger.exception or logger.error including the exception so stack traces
are captured; keep the rest of the logic the same and return booleans as before.
| all_image_records = [] | ||
| all_folder_ids = [] | ||
|
|
||
| def image_util_generate_thumbnail( | ||
| image_path: str, thumbnail_path: str, size: Tuple[int, int] = (200, 200) | ||
| ) -> bool: | ||
| """Generate thumbnail for a single image.""" | ||
| try: | ||
| with Image.open(image_path) as img: | ||
| img.thumbnail(size) | ||
| # Process each folder in the provided data | ||
| for folder_path, folder_id, recursive in folder_data: | ||
| try: | ||
| # Add folder ID to list for obsolete image cleanup | ||
| all_folder_ids.append(folder_id) | ||
|
|
||
| # Convert to RGB if the image has an alpha channel or is not RGB | ||
| if img.mode in ("RGBA", "P"): | ||
| img = img.convert("RGB") | ||
| # Step 1: Get all image files from current folder | ||
| image_files = image_util_get_images_from_folder(folder_path, recursive) | ||
|
|
||
| img.save(thumbnail_path, "JPEG") # Always save thumbnails as JPEG | ||
| return True | ||
| except Exception as e: | ||
| print(f"Error generating thumbnail for {image_path}: {e}") | ||
| return False | ||
| if not image_files: | ||
| continue # No images in this folder, continue to next | ||
|
|
||
| # Step 2: Create folder path mapping for this folder | ||
| folder_path_to_id = {os.path.abspath(folder_path): folder_id} | ||
|
|
||
| def image_util_get_images_from_folder(folder_path: str) -> List[str]: | ||
| """Get all image files from a folder.""" | ||
| image_files = [] | ||
| for root, _, files in os.walk(folder_path): | ||
| for file in files: | ||
| file_path = os.path.join(root, file) | ||
| if image_util_is_valid_image(file_path): | ||
| image_files.append(file_path) | ||
| return image_files | ||
| # Step 3: Prepare image records for this folder | ||
| folder_image_records = image_util_prepare_image_records(image_files, folder_path_to_id) | ||
| all_image_records.extend(folder_image_records) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Prevent duplicate scanning/records across overlapping folders; build a global mapping and dedupe processed files.
If folder_data contains overlapping roots (e.g., a parent recursive and its child), images can be collected multiple times. Since folder_path_to_id is recreated per-iteration, the “most specific folder” mapping is also not honored across all provided folders.
Refactor to:
- Build one global folder_path_to_id across folder_data.
- Keep a processed_paths set to dedupe image files.
- Use the global mapping when preparing image records.
Suggested patch:
- all_image_records = []
- all_folder_ids = []
+ all_image_records = []
+ all_folder_ids = []
+ # Build a global mapping once to allow "most specific" folder-ID resolution
+ folder_path_to_id = {os.path.abspath(p): fid for (p, fid, _) in folder_data}
+ processed_paths: set[str] = set()
@@
- for folder_path, folder_id, recursive in folder_data:
+ for folder_path, folder_id, recursive in folder_data:
try:
# Add folder ID to list for obsolete image cleanup
all_folder_ids.append(folder_id)
@@
- # Step 2: Create folder path mapping for this folder
- folder_path_to_id = {os.path.abspath(folder_path): folder_id}
-
- # Step 3: Prepare image records for this folder
- folder_image_records = image_util_prepare_image_records(image_files, folder_path_to_id)
+ # Step 2: Deduplicate overlapped scans
+ new_files = [p for p in image_files if p not in processed_paths]
+ if not new_files:
+ continue
+ processed_paths.update(new_files)
+
+ # Step 3: Prepare image records using the global mapping
+ folder_image_records = image_util_prepare_image_records(new_files, folder_path_to_id)
all_image_records.extend(folder_image_records)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| all_image_records = [] | |
| all_folder_ids = [] | |
| def image_util_generate_thumbnail( | |
| image_path: str, thumbnail_path: str, size: Tuple[int, int] = (200, 200) | |
| ) -> bool: | |
| """Generate thumbnail for a single image.""" | |
| try: | |
| with Image.open(image_path) as img: | |
| img.thumbnail(size) | |
| # Process each folder in the provided data | |
| for folder_path, folder_id, recursive in folder_data: | |
| try: | |
| # Add folder ID to list for obsolete image cleanup | |
| all_folder_ids.append(folder_id) | |
| # Convert to RGB if the image has an alpha channel or is not RGB | |
| if img.mode in ("RGBA", "P"): | |
| img = img.convert("RGB") | |
| # Step 1: Get all image files from current folder | |
| image_files = image_util_get_images_from_folder(folder_path, recursive) | |
| img.save(thumbnail_path, "JPEG") # Always save thumbnails as JPEG | |
| return True | |
| except Exception as e: | |
| print(f"Error generating thumbnail for {image_path}: {e}") | |
| return False | |
| if not image_files: | |
| continue # No images in this folder, continue to next | |
| # Step 2: Create folder path mapping for this folder | |
| folder_path_to_id = {os.path.abspath(folder_path): folder_id} | |
| def image_util_get_images_from_folder(folder_path: str) -> List[str]: | |
| """Get all image files from a folder.""" | |
| image_files = [] | |
| for root, _, files in os.walk(folder_path): | |
| for file in files: | |
| file_path = os.path.join(root, file) | |
| if image_util_is_valid_image(file_path): | |
| image_files.append(file_path) | |
| return image_files | |
| # Step 3: Prepare image records for this folder | |
| folder_image_records = image_util_prepare_image_records(image_files, folder_path_to_id) | |
| all_image_records.extend(folder_image_records) | |
| all_image_records = [] | |
| all_folder_ids = [] | |
| # Build a global mapping once to allow "most specific" folder-ID resolution | |
| folder_path_to_id = {os.path.abspath(p): fid for (p, fid, _) in folder_data} | |
| processed_paths: set[str] = set() | |
| # Process each folder in the provided data | |
| for folder_path, folder_id, recursive in folder_data: | |
| try: | |
| # Add folder ID to list for obsolete image cleanup | |
| all_folder_ids.append(folder_id) | |
| # Step 1: Get all image files from current folder | |
| image_files = image_util_get_images_from_folder(folder_path, recursive) | |
| if not image_files: | |
| continue # No images in this folder, continue to next | |
| # Step 2: Deduplicate overlapped scans | |
| new_files = [p for p in image_files if p not in processed_paths] | |
| if not new_files: | |
| continue | |
| processed_paths.update(new_files) | |
| # Step 3: Prepare image records using the global mapping | |
| folder_image_records = image_util_prepare_image_records(new_files, folder_path_to_id) | |
| all_image_records.extend(folder_image_records) |
🤖 Prompt for AI Agents
In backend/app/utils/images.py around lines 33 to 54, the code creates a
per-iteration folder_path_to_id and can double-scan images when folder_data
contains overlapping roots; refactor to first build a single global mapping of
absolute folder paths to folder IDs (resolve overlaps by preferring the most
specific/longest path for conflicts), initialize a processed_paths set before
the loop, then for each folder call image_util_get_images_from_folder, filter
out any file paths already in processed_paths (add new ones to the set), and
finally call image_util_prepare_image_records using the global mapping (not a
per-iteration mapping) so files are deduplicated and the most specific folder_id
is used for each image.
backend/app/utils/images.py
Outdated
| def image_util_find_folder_id_for_image(image_path: str, folder_path_to_id: Dict[str, int]) -> int: | ||
| """ | ||
| Find the most specific folder ID for a given image path. | ||
|
|
||
| Args: | ||
| image_path: Path to the image file | ||
| folder_path_to_id: Dictionary mapping folder paths to IDs | ||
|
|
||
| Returns: | ||
| Folder ID if found, None otherwise | ||
| """ | ||
| parent_folder = os.path.dirname(image_path) | ||
|
|
||
| current_path = parent_folder | ||
| while current_path: | ||
| if current_path in folder_path_to_id: | ||
| return folder_path_to_id[current_path] | ||
| current_path = os.path.dirname(current_path) | ||
|
|
||
| return None | ||
|
|
There was a problem hiding this comment.
Fix root-traversal bug and incorrect return typing in image_util_find_folder_id_for_image.
- while current_path: with os.path.dirname("/") never terminates on Unix, causing an infinite loop for images under filesystem root when there’s no mapping match.
- The function returns None but is typed as -> int.
Apply this diff to make traversal safe and update typing/docstring accordingly:
-def image_util_find_folder_id_for_image(image_path: str, folder_path_to_id: Dict[str, int]) -> int:
+def image_util_find_folder_id_for_image(image_path: str, folder_path_to_id: Dict[str, int]) -> Optional[int]:
@@
- Returns:
- Folder ID if found, None otherwise
+ Returns:
+ Optional[int]: Folder ID if found, None otherwise
@@
- parent_folder = os.path.dirname(image_path)
-
- current_path = parent_folder
- while current_path:
- if current_path in folder_path_to_id:
- return folder_path_to_id[current_path]
- current_path = os.path.dirname(current_path)
-
- return None
+ parent_folder = os.path.dirname(os.path.abspath(image_path))
+
+ current_path = parent_folder
+ while True:
+ if current_path in folder_path_to_id:
+ return folder_path_to_id[current_path]
+ parent = os.path.dirname(current_path)
+ if parent == current_path:
+ # Reached filesystem root
+ return None
+ current_path = parentAdditionally, add Optional to imports at the top of the file:
from typing import List, Tuple, Dict, Optional🤖 Prompt for AI Agents
In backend/app/utils/images.py around lines 258 to 278, the function
image_util_find_folder_id_for_image can hang when traversing up from root
(os.path.dirname("/") == "/") and its signature/docstring incorrectly claim it
returns int while it can return None; update the function signature to ->
Optional[int] and adjust the docstring to state it returns Optional[int], add
Optional to the typing imports, and fix the traversal to break when current_path
is empty or when os.path.dirname(current_path) == current_path (i.e., reached
filesystem root) so the loop terminates safely; ensure the function still
returns the mapped folder_id when found and None otherwise.
| ## API Endpoints | ||
|
|
||
| ### Core Endpoints | ||
|
|
||
| - **`GET /`** - Service information | ||
| - **`GET /health`** - Health check with database and watcher status | ||
| - **`GET /folders`** - List all folders being watched | ||
| - **`GET /watcher/status`** - Current watcher status | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify endpoints and file references across the repo to ensure README accuracy.
Check that no stray GET /folders or file_watcher.py exist, and confirm the /api/v1 prefixed routes.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching for a GET /folders endpoint..."
rg -nP -C2 --type=py '@router\.(get|post|put|delete)\s*\(\s*"/folders"\s*' || echo "No direct /folders endpoint found."
echo "Searching for file_watcher.py..."
fd -H -a 'file_watcher.py' || echo "No file_watcher.py found."
echo "List defined routers and their prefixes in sync-microservice..."
rg -nP --type=py -C2 'APIRouter\(' sync-microservice
echo "List health/folder/watcher endpoints..."
rg -nP --type=py -C2 '@router\.(get|post|put|delete)\s*\(' sync-microservice/app/routesLength of output: 3672
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Locating FastAPI app instantiation and router includes..."
rg -nP -C2 "FastAPI\(" -n sync-microservice/app
rg -nP -C2 "include_router" -n sync-microservice/appLength of output: 194
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching for FastAPI app instantiation..."
rg -nP --type=py -C2 "FastAPI\(" .
echo "Searching for router inclusion (include_router)..."
rg -nP --type=py -C2 "include_router" .Length of output: 4377
Update README API endpoints to align with actual routing
Please update sync-microservice/README.md so that the “Core Endpoints” exactly match the routes registered in main.py (and include the /api/v1 prefix):
• Prefix all entries with /api/v1
• Replace
GET /folders– List all folders being watched
with
GET /api/v1/folders/status(the only folders route defined)
• ReplaceGET /watcher/status
with
GET /api/v1/watcher/status
• Add the watcher control endpoints you’ve exposed:POST /api/v1/watcher/startPOST /api/v1/watcher/stopPOST /api/v1/watcher/restart
• Confirm whether a root/(service info) endpoint actually exists—if not, remove it from the docs.
Mandatory corrections are needed in sync-microservice/README.md to keep docs and code in sync.
🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 47 to 55, the "Core Endpoints" list
does not match the routes in main.py; update every entry to include the /api/v1
prefix, replace "GET /folders" with "GET /api/v1/folders/status", change "GET
/watcher/status" to "GET /api/v1/watcher/status", add the watcher control
endpoints "POST /api/v1/watcher/start", "POST /api/v1/watcher/stop", and "POST
/api/v1/watcher/restart", and verify whether a root "GET /" service info
endpoint exists in main.py—if it does not, remove that entry from the README.
| ### Core Endpoints | ||
|
|
||
| - **`GET /`** - Service information | ||
| - **`GET /health`** - Health check with database and watcher status | ||
| - **`GET /folders`** - List all folders being watched | ||
| - **`GET /watcher/status`** - Current watcher status | ||
|
|
There was a problem hiding this comment.
Fix endpoint paths and names to match the code (prefix /api/v1; no GET /; folders endpoint is /folders/status).
The README lists:
- GET / (not implemented)
- GET /health (actual: /api/v1/health)
- GET /folders (not implemented)
- GET /watcher/status (actual: /api/v1/watcher/status)
Align docs with implemented routes registered in main.py.
Apply this diff:
-### Core Endpoints
-
-- **`GET /`** - Service information
-- **`GET /health`** - Health check with database and watcher status
-- **`GET /folders`** - List all folders being watched
-- **`GET /watcher/status`** - Current watcher status
+### Core Endpoints (all under `/api/v1`)
+
+- **`GET /api/v1/health`** - Health check with database and watcher status
+- **`GET /api/v1/folders/status`** - Folder-wise tagging progress
+- **`GET /api/v1/watcher/status`** - Current watcher status📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Core Endpoints | |
| - **`GET /`** - Service information | |
| - **`GET /health`** - Health check with database and watcher status | |
| - **`GET /folders`** - List all folders being watched | |
| - **`GET /watcher/status`** - Current watcher status | |
| ### Core Endpoints (all under `/api/v1`) | |
| - **`GET /api/v1/health`** - Health check with database and watcher status | |
| - **`GET /api/v1/folders/status`** - Folder-wise tagging progress | |
| - **`GET /api/v1/watcher/status`** - Current watcher status |
🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 49 to 55, update the Core Endpoints
to match the routes registered in main.py: remove the non-implemented GET /
entry, change GET /health to GET /api/v1/health, change GET /watcher/status to
GET /api/v1/watcher/status, and replace GET /folders with GET /folders/status
(or, if implemented under the API prefix, GET /api/v1/folders/status); ensure
all endpoint paths are consistently prefixed with /api/v1 to match the code.
| **Folders List:** | ||
|
|
||
| ```json | ||
| { | ||
| "total_folders": 3, | ||
| "folders": [ | ||
| { "id": "uuid-1", "path": "/path/to/folder1" }, | ||
| { "id": "uuid-2", "path": "/path/to/folder2" } | ||
| ] | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Update example response to actual API shape returned by /folders/status.
The example shows a folders list not implemented. The implemented endpoint returns a success envelope with data items containing tagging_percentage.
Apply this diff:
-**Folders List:**
+**Folder Tagging Status:**
```json
-{
- "total_folders": 3,
- "folders": [
- { "id": "uuid-1", "path": "/path/to/folder1" },
- { "id": "uuid-2", "path": "/path/to/folder2" }
- ]
-}
+{
+ "status": "success",
+ "total_folders": 3,
+ "data": [
+ { "folder_id": "uuid-1", "folder_path": "/path/to/folder1", "tagging_percentage": 42.0 },
+ { "folder_id": "uuid-2", "folder_path": "/path/to/folder2", "tagging_percentage": 87.5 }
+ ]
+}🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 68 to 78, the example JSON for the
/folders/status response uses an outdated shape (total_folders and folders array
with id/path); update it to the actual API envelope by wrapping with
"status":"success", keep "total_folders": 3, rename the folder entries to
"folder_id" and "folder_path", replace the "folders" array with "data", and add
the "tagging_percentage" numeric field for each item as shown in the suggested
diff.
| ``` | ||
| sync-microservice/ | ||
| ├── app/ | ||
| │ ├── config/ | ||
| │ │ └── settings.py # Configuration settings | ||
| │ ├── database/ | ||
| │ │ ├── __init__.py | ||
| │ │ └── folders.py # Database operations for folders | ||
| │ └── utils/ | ||
| │ ├── __init__.py | ||
| │ └── file_watcher.py # File watching implementation | ||
| ├── main.py # FastAPI application | ||
| └── requirements.txt # Dependencies | ||
| ``` |
There was a problem hiding this comment.
Fix architecture section (file name and markdownlint MD040).
- The file is app/utils/watcher.py (not file_watcher.py).
- Add a fenced code language to satisfy MD040.
Apply this diff:
-```
+```text
sync-microservice/
├── app/
│ ├── config/
│ │ └── settings.py # Configuration settings
│ ├── database/
│ │ ├── __init__.py
│ │ └── folders.py # Database operations for folders
-│ └── utils/
-│ ├── __init__.py
-│ └── file_watcher.py # File watching implementation
+│ └── utils/
+│ ├── __init__.py
+│ └── watcher.py # File watching implementation
├── main.py # FastAPI application
└── requirements.txt # Dependencies🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
82-82: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 82 to 95, the architecture section
lists the wrong utility filename and lacks a fenced code language; update the
utils entry from file_watcher.py to watcher.py and wrap the tree in a fenced
code block with a language tag (use "text") so the block starts with ```text and
ends with ```, preserving the rest of the content exactly as in the provided
diff.
| ### Adding New Features | ||
|
|
||
| 1. **Database Operations**: Add new functions to `app/database/folders.py` | ||
| 2. **File Processing**: Extend `app/utils/file_watcher.py` | ||
| 3. **API Endpoints**: Add routes to `main.py` | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Correct development guidance to reflect the actual module layout.
Recommend extending watcher.py and adding new endpoints via routers in app/routes rather than directly in main.py.
Apply this diff:
-1. **Database Operations**: Add new functions to `app/database/folders.py`
-2. **File Processing**: Extend `app/utils/file_watcher.py`
-3. **API Endpoints**: Add routes to `main.py`
+1. **Database Operations**: Add new functions to `app/database/folders.py`
+2. **File Processing**: Extend `app/utils/watcher.py`
+3. **API Endpoints**: Create routers under `app/routes/` and include them in `main.py`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Adding New Features | |
| 1. **Database Operations**: Add new functions to `app/database/folders.py` | |
| 2. **File Processing**: Extend `app/utils/file_watcher.py` | |
| 3. **API Endpoints**: Add routes to `main.py` | |
| ### Adding New Features | |
| 1. **Database Operations**: Add new functions to `app/database/folders.py` | |
| 2. **File Processing**: Extend `app/utils/watcher.py` | |
| 3. **API Endpoints**: Create routers under `app/routes/` and include them in `main.py` |
🧰 Tools
🪛 LanguageTool
[grammar] ~131-~131: There might be a mistake here.
Context: ...base Operations**: Add new functions to app/database/folders.py 2. File Processing: Extend `app/utils/fil...
(QB_NEW_EN)
[grammar] ~132-~132: There might be a mistake here.
Context: ...ders.py2. **File Processing**: Extendapp/utils/file_watcher.py3. **API Endpoints**: Add routes tomain.py`...
(QB_NEW_EN)
🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 129 to 134, the guidance is
incorrect: instead of "Extend app/utils/file_watcher.py" and "Add routes to
main.py", update the instructions to say extend the watcher module at
app/utils/watcher.py (or the actual watcher file used by the project) and add
new API endpoints by creating routers under app/routes (registering them in
main.py only via include_router), so change the README text to reflect those
exact module paths and the preferred router-based approach.
Closes #475
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores