Skip to content

Add sync microservice for watching file system events#486

Merged
rahulharpal1603 merged 2 commits intoAOSSIE-Org:mainfrom
rahulharpal1603:backend-revamp
Aug 20, 2025
Merged

Add sync microservice for watching file system events#486
rahulharpal1603 merged 2 commits intoAOSSIE-Org:mainfrom
rahulharpal1603:backend-revamp

Conversation

@rahulharpal1603
Copy link
Contributor

@rahulharpal1603 rahulharpal1603 commented Aug 20, 2025

Closes #475

Summary by CodeRabbit

  • New Features

    • Added a Sync Microservice: health, watcher (status/start/stop/restart), and folder tagging status endpoints; real-time folder monitoring with graceful startup/shutdown.
    • Backend: subtree-aware sync, background post-sync processing, and batch image import with thumbnail generation and stale-image cleanup.
  • Refactor

    • Image handling redesigned for multi-folder, batched processing to improve performance and consistency.
  • Documentation

    • New Sync Microservice README with setup and API usage.
  • Chores

    • Added dependency manifest, .gitignore, and startup initialization streamlining.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a Sync microservice (FastAPI) with health, watcher, and folder-tagging-status endpoints and a filesystem watcher; supplies configs, schemas, README, and requirements. Updates the primary backend to batch-process images across folder trees, add DB helpers for folders/images, adjust add/sync flows, and initialize tables earlier.

Changes

Cohort / File(s) Summary
Backend DB helpers
backend/app/database/folders.py, backend/app/database/images.py
Added bulk lookup helpers: db_get_folder_ids_by_paths, db_get_images_by_folder_ids, and db_delete_images_by_ids. Minor docstring/formatting tweaks.
Backend folder routes and flows
backend/app/routes/folders.py
post-add/sync sequences updated to accept folder IDs and aggregate subtree folders via db_get_folder_ids_by_path_prefix; added post_sync_folder_sequence; background tasks now send multi-folder data for processing.
Backend utils: images & folders
backend/app/utils/images.py, backend/app/utils/folders.py
Replaced single-folder image processing with multi-folder, bulk operations: image_util_process_folder_images, thumbnail generation, preparers, obsolete-image cleanup; folder_util_add_multiple_folder_trees now returns (folder_id, folder_path) tuples.
Backend startup
backend/main.py
Moved folders/images table creation earlier in startup and removed duplicate init calls.
Sync microservice core & watcher
sync-microservice/app/core/lifespan.py, sync-microservice/app/utils/watcher.py
New lifespan manager to check DB and start/stop watcher; watcher thread reads folders from DB, maps changes to folder IDs, calls primary backend sync/delete APIs, and exposes control/query helpers.
Sync microservice API & schemas
sync-microservice/main.py, sync-microservice/app/routes/*.py, sync-microservice/app/schemas/*.py, sync-microservice/app/schemas/__init__.py
New FastAPI app with routers for health, watcher, and folder tagging status; added Pydantic schemas for health, watcher, and folder tagging responses.
Sync microservice config, docs, deps
sync-microservice/app/config/settings.py, sync-microservice/README.md, sync-microservice/requirements.txt, sync-microservice/.gitignore, sync-microservice/pyproject.toml
Added static settings/constants (model paths, DB path, URLs, image paths), README, pinned requirements, gitignore, and ruff config.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor OS as File System
  participant SM as Sync Microservice
  participant WL as Watcher Thread
  participant DB as SQLite (folders/images)
  participant BE as Primary Backend

  rect rgb(235, 245, 255)
  note over SM: Startup (lifespan)
  SM->>DB: db_check_database_connection()
  DB-->>SM: ok/err
  SM->>SM: start watcher if DB ok
  SM->>WL: watcher_util_start_folder_watcher()
  WL->>DB: db_get_all_folders_with_ids()
  DB-->>WL: [(folder_id, path)...]
  end

  rect rgb(245, 255, 245)
  note over WL: Runtime monitoring
  OS-->>WL: change events (created/modified/deleted)
  WL->>WL: map changes to watched folder(s)
  alt non-delete changes
    WL->>BE: POST /folders/sync-folder {folder_id, folder_path}
    BE-->>WL: 200/err
  else deletions
    WL->>BE: DELETE /folders/delete-folders {folder_ids}
    BE-->>WL: 200/err
    WL->>WL: restart watcher
  end
  end

  rect rgb(255, 245, 235)
  note over SM: Health and status APIs
  SM->>DB: db_check_database_connection()
  SM->>WL: watcher_util_is_watcher_running()
  SM-->>Client: Health/Watcher status JSON
  end
Loading
sequenceDiagram
  autonumber
  participant API as Primary Backend /folders
  participant IF as Image Utils
  participant Dbf as DB (folders/images)
  participant FS as File System

  note over API: Add/Sync Folder
  API->>Dbf: Insert/lookup folder(s)
  API->>IF: image_util_process_folder_images([(path,id,recursive)...])
  loop per folder batch
    IF->>FS: scan images (recursive?)
    FS-->>IF: file list
    IF->>IF: prepare records, generate thumbnails
  end
  IF->>Dbf: bulk insert/update images
  IF->>Dbf: query images by folder_ids
  IF->>Dbf: delete obsolete images
  IF-->>API: success/failure
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Assessment against linked issues

Objective Addressed Explanation
Add secondary backend (independent microservice) to sync folders/images with OS (#475)
Provide endpoint for folder-wise AI tagging status in the microservice (#475)

Poem

In softly humming threads I peek,
A watchful bun with twitchy cheek.
I sniff the paths, I map the IDs,
Thumbnails sprout like leafy trees.
“All synced!” I thump—ears in delight— 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4f245f1 and ba8cf74.

📒 Files selected for processing (7)
  • backend/app/database/folders.py (1 hunks)
  • backend/app/database/images.py (1 hunks)
  • backend/app/routes/folders.py (6 hunks)
  • backend/app/utils/folders.py (3 hunks)
  • backend/app/utils/images.py (3 hunks)
  • backend/main.py (1 hunks)
  • sync-microservice/pyproject.toml (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@rahulharpal1603 rahulharpal1603 merged commit fc2021c into AOSSIE-Org:main Aug 20, 2025
4 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/utils/folders.py (1)

13-33: Bug: root_path normalization missing (breaks parent linkage and return value).

dirpath is normalized to absolute, but root_path is not. This causes:

  • The root dir comparison to fail, preventing parent_folder_id from being applied to the root node.
  • A KeyError on return if root_path was provided as a relative path (folder_map stores absolute paths).

Fix by normalizing root_path upfront and using it consistently.

Apply:

-def folder_util_add_folder_tree(root_path, parent_folder_id=None, AI_Tagging=False, taggingCompleted=None):
+def folder_util_add_folder_tree(root_path, parent_folder_id=None, AI_Tagging=False, taggingCompleted=None):
     """
     Recursively collect folder data and insert all folders in a single database transaction.
     All folders are initially inserted with NULL parent_id, which is updated after insertion.
     Returns the root folder's UUID and the folder map (containing folder_id and parent_id).
     """
+    # Normalize to absolute so keys and comparisons match os.walk outputs
+    root_path = os.path.abspath(root_path)
     folders_data = []
     folder_map = {}  # Maps path to (folder_id, parent_id)
 
     for dirpath, dirnames, _ in os.walk(root_path, topdown=True):
         dirpath = os.path.abspath(dirpath)
         # Generate a UUID for this folder
         this_folder_id = str(uuid.uuid4())
 
         # Determine parent ID for the map (not for initial insert)
         if dirpath == root_path:
             parent_id = parent_folder_id
         else:
             parent_path = os.path.dirname(dirpath)
-            parent_id = folder_map[parent_path][0] if parent_path in folder_map else None
+            parent_id = folder_map[parent_path][0] if parent_path in folder_map else None
@@
-    return folder_map[root_path][0], folder_map
+    return folder_map[root_path][0], folder_map
backend/app/database/images.py (1)

235-268: Enable foreign key enforcement to ensure ON DELETE CASCADE effects.

SQLite requires PRAGMA foreign_keys = ON per-connection. Without it, deleting images won’t cascade to image_classes, leaving orphan rows. Enable it before the DELETE.

 def db_delete_images_by_ids(image_ids: List[ImageId]) -> bool:
@@
-    conn = sqlite3.connect(DATABASE_PATH)
-    cursor = conn.cursor()
+    conn = sqlite3.connect(DATABASE_PATH)
+    cursor = conn.cursor()
+    # Ensure cascading deletes for image_classes
+    cursor.execute("PRAGMA foreign_keys = ON;")
@@
-        placeholders = ",".join("?" for _ in image_ids)
+        placeholders = ",".join(["?"] * len(image_ids))

Longer-term, consider a small helper to open connections with foreign_keys enabled to avoid repeating this in every write-path.

backend/app/utils/images.py (1)

102-110: Guard against None from classifier to avoid TypeError.

ObjectClassifier.get_classes can return None. len(classes) will raise if so.

Apply this minimal fix:

-            # Step 2: Insert class-image pairs if classes were detected
-            if len(classes) > 0:
+            # Step 2: Insert class-image pairs if classes were detected
+            if classes:
                 # Create image-class pairs
                 image_class_pairs = [(image_id, class_id) for class_id in classes]
                 print(image_class_pairs)
 
                 # Insert the pairs into the database
                 db_insert_image_classes_batch(image_class_pairs)
🧹 Nitpick comments (32)
backend/app/utils/folders.py (1)

122-163: API return enrichment is useful; minor naming clarity nit.

Returning all (folder_id, folder_path) pairs across added trees is a nice improvement for downstream batch image processing. Consider avoiding shadowing folder_path in the inner loop for readability.

-            for folder_path_in_map, (folder_id_in_map, _) in folder_map.items():
-                added_folders.append((folder_id_in_map, folder_path_in_map))
+            for path_in_map, (id_in_map, _) in folder_map.items():
+                added_folders.append((id_in_map, path_in_map))

Also consider replacing print with a structured logger for uniform diagnostics across the service.

backend/app/database/folders.py (1)

333-369: Batch path lookup reads well; minor placeholder readability.

The IN-clause placeholder construction works but is slightly opaque. Consider the more idiomatic list-multiply or generator form for clarity.

-        placeholders = ",".join("?" * len(abs_paths))
+        placeholders = ",".join(["?"] * len(abs_paths))

Otherwise, normalization to absolute paths and mapping output are spot on.

backend/app/utils/images.py (3)

115-116: Don’t mark images as tagged when classification/detection fails.

If classes is None (failed load) or db insert/detection errors occur, you still set isTagged = True. That hides failures and prevents retries.

Suggested adjustment:

  • Update status only when at least one of: classes inserted successfully or face detection completes, or explicitly track a failed state to allow retry.

193-208: Ensure thumbnail directory exists within the generator (safer for reuse).

image_util_generate_thumbnail assumes THUMBNAIL_IMAGES_PATH exists (created by the caller). Make it robust if called elsewhere.

         with Image.open(image_path) as img:
             img.thumbnail(size)
@@
-            img.save(thumbnail_path, "JPEG")  # Always save thumbnails as JPEG
+            # Ensure destination directory exists
+            os.makedirs(os.path.dirname(thumbnail_path), exist_ok=True)
+            img.save(thumbnail_path, "JPEG")  # Always save thumbnails as JPEG

55-57: Replace print statements with structured logging.

Use a shared logger for visibility and to integrate with the service’s logging pipeline. This also allows toggling verbosity in production vs. dev.

Also applies to: 186-188, 206-207

sync-microservice/app/schemas/__init__.py (1)

1-1: Consider re-exporting commonly used schemas from the package init for ergonomic imports.

This allows from app.schemas import HealthCheckResponse, ... and centralizes the public API surface.

Apply this diff to re-export the main schemas:

-"""Schemas package for sync-microservice API responses and requests."""
+"""Schemas package for sync-microservice API responses and requests."""
+
+# Re-export commonly used schemas for convenient imports:
+from .health import HealthCheckResponse
+from .folders import (
+    FolderTaggingInfo,
+    FolderTaggingStatusSuccessResponse,
+    FolderTaggingStatusErrorResponse,
+)
+from .watcher import (
+    WatchedFolder,
+    WatcherStatusResponse,
+    WatcherControlResponse,
+    WatcherErrorResponse,
+)
+
+__all__ = [
+    "HealthCheckResponse",
+    "FolderTaggingInfo",
+    "FolderTaggingStatusSuccessResponse",
+    "FolderTaggingStatusErrorResponse",
+    "WatchedFolder",
+    "WatcherStatusResponse",
+    "WatcherControlResponse",
+    "WatcherErrorResponse",
+]
sync-microservice/requirements.txt (3)

40-40: Add a trailing newline at EOF.

Minor formatting nit that some tooling enforces.

- websockets==15.0.1
+ websockets==15.0.1
+

10-13: Option: consolidate uvicorn extras to simplify maintenance.

Using uvicorn’s standard extras avoids separately pinning httptools/uvloop/websockets and reduces drift.

-httptools==0.6.4
-uvicorn==0.35.0
-uvloop==0.21.0
-websockets==15.0.1
+uvicorn[standard]==0.35.0

Note: uvicorn[standard] already depends on httptools, uvloop, websockets, and watchfiles; ensure version compatibility with your current pins if you adopt this.

Also applies to: 37-39


1-40: Split runtime vs. dev dependencies & consolidate Uvicorn extras

The import scan under sync-microservice shows these packages aren’t directly imported and can be moved into a dev-only requirements file (e.g., dev-requirements.txt):

• fastapi-cli
• fastapi-cloud-cli
• Jinja2
• markdown-it-py
• mdurl
• Pygments
• rich
• rich-toolkit
• rignore
• typer
• python-dotenv
• python-multipart
• PyYAML
• email_validator
• sentry-sdk

Recommendations:

  • In requirements.txt, keep only the direct runtime dependencies (e.g., fastapi, httpx, watchfiles).

  • Consolidate Uvicorn extras by replacing httptools, uvloop, websockets with:

    uvicorn[standard]==<version>
    
  • Pin transitive dependencies (anyio, h11, httpcore, idna, urllib3, etc.) via a constraints or lock file, rather than listing them in the main reqs.

  • Maintain a separate dev requirements file for linters, docs, CLI tools, and test utilities.

sync-microservice/.gitignore (2)

1-1: Top-level ignore for all .db files is broad—confirm this won’t hide checked-in fixtures.

**/*.db will also ignore any test fixtures or sample DBs you might want to commit inside this subtree. If that’s intentional, all good; otherwise, scope it (e.g., to a runtime dir).

Would you like a targeted pattern set (e.g., ignore only the runtime SQLite path configured in settings)?


137-142: Verify ignoring images/ is desired for this package.

Ignoring images/ at the microservice root could hide docs/assets placed here later. If this directory is only for runtime cache/temp content, consider renaming it (e.g., .runtime-images/) and keeping the ignore specific.

sync-microservice/app/schemas/health.py (1)

5-10: Consider Enums for repeated status strings to avoid typos across modules.

Using small Enum classes (e.g., ServiceStatus, DatabaseStatus, WatcherStatus) centralizes allowed values and makes reuse in other schemas/routes safer.

If you want, I can propose a small enums.py with these definitions and wire this schema to use them.

sync-microservice/app/schemas/folders.py (2)

21-36: Schemas look good; consider minor refinements.

  • You can use NonNegativeInt for total_folders for readability (current ge=0 is fine).
  • If you plan JSON examples, add model_config = ConfigDict(json_schema_extra={"examples": [...]}) to improve OpenAPI docs.

I can add examples if you share sample payloads from the /folders/status route.


5-16: Rename Pydantic FolderTaggingInfo to avoid naming collision

  • Definitions found:
    • sync-microservice/app/database/folders.py:11class FolderTaggingInfo(NamedTuple)
    • sync-microservice/app/schemas/folders.py:5class FolderTaggingInfo(BaseModel)
  • No direct imports of FolderTaggingInfo from either module were detected, but sharing the same class name across layers can still lead to subtle bugs if the wrong one is imported.

Two safe options:

  • Rename the schema class to FolderTaggingInfoModel (or similar) and re-export it as FolderTaggingInfo from app.schemas for a consistent public API.
  • Keep the existing name but enforce fully-qualified imports (db.FolderTaggingInfo vs. schemas.FolderTaggingInfo) and add a lint rule or static type check to catch ambiguous references.
sync-microservice/README.md (2)

97-104: Document a robust database path configuration using an environment variable.

Running from a different working directory will break relative paths. Recommend documenting an env override and path resolution relative to the repo root.

Apply this diff:

-```python
-DATABASE_PATH = "../backend/app/database/PictoPy.db"
-```
+```python
+import os
+from pathlib import Path
+
+# Resolve to monorepo root and default DB path
+REPO_ROOT = Path(__file__).resolve().parents[3]
+DATABASE_PATH = os.getenv(
+    "PICTOPY_DB_PATH",
+    str(REPO_ROOT / "backend/app/database/PictoPy.db"),
+)
+```

7-12: Minor copy polish (articles) for better readability.

Examples:

  • “Watches all folders registered in the PictoPy database”
  • “Connects to the main PictoPy database to get folder information”
  • “Provides health check endpoints for monitoring”
sync-microservice/app/config/settings.py (2)

1-5: Allow URL overrides via environment variables for deployment flexibility.

Make PRIMARY_BACKEND_URL and SYNC_MICROSERVICE_URL configurable to support non-localhost deployments.

Apply this diff:

-# Model Exports Path
-MODEL_EXPORTS_PATH = "app/models/ONNX_Exports"
-PRIMARY_BACKEND_URL = "http://localhost:8000"
-SYNC_MICROSERVICE_URL = "http://localhost:8001"
+# Model Exports Path
+MODEL_EXPORTS_PATH = "app/models/ONNX_Exports"
+import os
+PRIMARY_BACKEND_URL = os.getenv("PICTOPY_PRIMARY_URL", "http://localhost:8000")
+SYNC_MICROSERVICE_URL = os.getenv("PICTOPY_SYNC_URL", "http://localhost:8001")

23-25: Consider normalizing the image paths relative to a base directory.

THUMBNAIL_IMAGES_PATH and IMAGES_PATH are CWD-relative and may drift. Consider deriving them from a BASE_DATA_DIR or env var similar to DATABASE_PATH.

sync-microservice/main.py (1)

13-16: Optionally add a root (/) info endpoint or update README to remove it.

To match the README’s “GET /” section, expose a simple service info route; otherwise, remove that doc entry.

Apply this diff:

 app.include_router(health.router, prefix="/api/v1")
 app.include_router(watcher.router, prefix="/api/v1")
 app.include_router(folders.router, prefix="/api/v1")
+
+# Optional: root info endpoint
+@app.get("/")
+def root():
+    return {
+        "name": "PictoPy Sync Microservice",
+        "version": app.version,
+        "docs_url": "/docs",
+        "openapi_url": "/openapi.json",
+        "api_base": "/api/v1",
+    }
sync-microservice/app/routes/health.py (1)

9-17: Return HTTP 503 when unhealthy (without changing the response shape).

Set the response status code to 503 when the DB check fails, keeping the HealthCheckResponse payload intact.

Apply this diff:

-from fastapi import APIRouter
+from fastapi import APIRouter, Response
 
 @router.get("/health", response_model=HealthCheckResponse)
-async def health_check():
+async def health_check(response: Response):
     """Health check endpoint."""
     db_status = db_check_database_connection()
-    return HealthCheckResponse(
+    if not db_status:
+        response.status_code = 503
+    return HealthCheckResponse(
         status="healthy" if db_status else "unhealthy",
         database="connected" if db_status else "disconnected",
         watcher="running" if watcher_util_is_watcher_running() else "stopped",
     )
sync-microservice/app/routes/folders.py (3)

13-20: Use 500 status code on errors while preserving the error envelope.

Currently both success and error return 200. Set the HTTP status code to 500 on exception to aid monitoring/alerting without changing the response model.

Apply this diff:

-from fastapi import APIRouter
+from fastapi import APIRouter, Response
@@
-@router.get(
+@router.get(
@@
-def get_folders_tagging_status():
+def get_folders_tagging_status(response: Response):

And in the except block:

-    except Exception as e:
-        return FolderTaggingStatusErrorResponse(
+    except Exception as e:
+        response.status_code = 500
+        return FolderTaggingStatusErrorResponse(
             status="error",
             message=f"Failed to retrieve tagging progress: {str(e)}",
         )

23-28: Docstring: clarify the response envelope shape.

The endpoint returns a success envelope with data and total_folders (not a bare list).

Apply this diff:

-    Returns:
-        List of folders with their tagging progress information including:
-        - folder_id: Unique identifier for the folder
-        - folder_path: Path to the folder
-        - tagging_percentage: Percentage of images that have been tagged (0-100)
+    Returns:
+        On success:
+        - status: "success"
+        - total_folders: count of folders
+        - data: List of folders with their tagging progress:
+          - folder_id: Unique identifier for the folder
+          - folder_path: Path to the folder
+          - tagging_percentage: Percentage of images tagged (0-100)
+
+        On error:
+        - status: "error"
+        - message: Error details
+        - total_folders: 0
+        - data: []

32-39: Optional: return type hint and result ordering for predictable output.

  • Consider adding a return type hint: Union[FolderTaggingStatusSuccessResponse, FolderTaggingStatusErrorResponse].
  • Optionally sort by folder_path or folder_id for stable responses.
sync-microservice/app/core/lifespan.py (2)

11-13: Stop the watcher unconditionally on shutdown; remove the fragile watcher_started global.

If the watcher is started later via the /watcher/start API, watcher_started will remain False and the shutdown block won't stop the watcher. Stopping is idempotent and already guarded in watcher_util_stop_folder_watcher, so it's safe to call it unconditionally and drop the global flag.

Apply this diff:

-# Global variable to track watcher status
-watcher_started = False
-
@@
-    global watcher_started
+    # no module-level state needed
@@
-        watcher_started = watcher_util_start_folder_watcher()
+        watcher_util_start_folder_watcher()
@@
-        if watcher_started:
-            watcher_util_stop_folder_watcher()
-            print("Folder watcher stopped")
+        watcher_util_stop_folder_watcher()
+        print("Folder watcher stopped")

Also applies to: 20-21, 36-37, 51-54


24-47: Prefer structured logging over print for observability.

Replacing print with logging.getLogger(name).info()/error() provides levels, timestamps, and integration with FastAPI/Uvicorn logs. This will help trace issues in production.

sync-microservice/app/routes/watcher.py (1)

64-69: Consider returning HTTP 409 when the watcher is already running.

Semantic improvement: returning a 409 Conflict for “already running” communicates intent better than a 200 with success=False. Not mandatory, just a UX/API polish.

sync-microservice/app/database/folders.py (1)

6-9: Type alias nit: use FolderPath for readability.

Minor consistency fix to use your FolderPath alias in FolderIdPath.

-FolderIdPath = Tuple[FolderId, str]
+FolderIdPath = Tuple[FolderId, FolderPath]
sync-microservice/app/utils/watcher.py (5)

47-58: Deduplicate deleted folder IDs and avoid repeated API calls.

Use a set to aggregate deleted IDs and cast to list when calling the API.

Apply this diff:

-    deleted_folder_ids = []
+    deleted_folder_ids = set()
@@
-                deleted_folder_ids.append(deleted_folder_id)
+                deleted_folder_ids.add(deleted_folder_id)
@@
-    if deleted_folder_ids:
-        print(f"Calling delete API for {len(deleted_folder_ids)} deleted folders")
-        watcher_util_call_delete_folders_api(deleted_folder_ids)
+    if deleted_folder_ids:
+        print(f"Calling delete API for {len(deleted_folder_ids)} deleted folders")
+        watcher_util_call_delete_folders_api(list(deleted_folder_ids))
         watcher_util_restart_folder_watcher()

Also applies to: 73-77


110-111: Remove debug print.

This noisy debug line will spam logs on every change.

-    print("best match: ", best_match)

98-106: Make parent-folder detection OS-robust.

Using startswith can be brittle across case-insensitive filesystems and symlinks. Consider os.path.commonpath or pathlib.Path.resolve() + is_relative_to (Python 3.9+: emulate with try/except).

Example refactor:

from pathlib import Path

file_path = Path(file_path).resolve()
for folder_id, folder_path in watched_folders:
    folder_resolved = Path(folder_path).resolve()
    try:
        file_path.relative_to(folder_resolved)
        # folder_resolved is a parent
        ...
    except ValueError:
        continue

179-183: Confirm non-recursive watching is intentional.

recursive=False means changes in subdirectories won’t be observed. Given the goal to sync folder and image data, you might want recursive=True, otherwise nested albums won’t trigger syncs.

Would you like me to open a follow-up patch to toggle this (and document the behavior)?


223-271: Minor: folder_id_map is populated but unused.

You build folder_id_map but don’t use it anywhere. Either remove it for clarity or leverage it in watcher_util_get_folder_id_if_watched for O(1) lookups.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1fd1cb0 and 4f245f1.

📒 Files selected for processing (21)
  • backend/app/database/folders.py (6 hunks)
  • backend/app/database/images.py (1 hunks)
  • backend/app/routes/folders.py (7 hunks)
  • backend/app/utils/folders.py (7 hunks)
  • backend/app/utils/images.py (3 hunks)
  • backend/main.py (4 hunks)
  • sync-microservice/.gitignore (1 hunks)
  • sync-microservice/README.md (1 hunks)
  • sync-microservice/app/config/settings.py (1 hunks)
  • sync-microservice/app/core/lifespan.py (1 hunks)
  • sync-microservice/app/database/folders.py (1 hunks)
  • sync-microservice/app/routes/folders.py (1 hunks)
  • sync-microservice/app/routes/health.py (1 hunks)
  • sync-microservice/app/routes/watcher.py (1 hunks)
  • sync-microservice/app/schemas/__init__.py (1 hunks)
  • sync-microservice/app/schemas/folders.py (1 hunks)
  • sync-microservice/app/schemas/health.py (1 hunks)
  • sync-microservice/app/schemas/watcher.py (1 hunks)
  • sync-microservice/app/utils/watcher.py (1 hunks)
  • sync-microservice/main.py (1 hunks)
  • sync-microservice/requirements.txt (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (11)
sync-microservice/app/routes/health.py (3)
sync-microservice/app/database/folders.py (1)
  • db_check_database_connection (44-68)
sync-microservice/app/utils/watcher.py (1)
  • watcher_util_is_watcher_running (211-213)
sync-microservice/app/schemas/health.py (1)
  • HealthCheckResponse (5-10)
sync-microservice/main.py (2)
sync-microservice/app/core/lifespan.py (1)
  • lifespan (16-54)
backend/main.py (1)
  • lifespan (32-49)
sync-microservice/app/schemas/folders.py (1)
sync-microservice/app/database/folders.py (1)
  • FolderTaggingInfo (11-16)
backend/main.py (2)
backend/app/database/folders.py (1)
  • db_create_folders_table (15-32)
backend/app/database/images.py (1)
  • db_create_images_table (31-64)
sync-microservice/app/core/lifespan.py (3)
sync-microservice/app/database/folders.py (1)
  • db_check_database_connection (44-68)
sync-microservice/app/utils/watcher.py (2)
  • watcher_util_start_folder_watcher (216-271)
  • watcher_util_stop_folder_watcher (274-302)
backend/main.py (1)
  • lifespan (32-49)
sync-microservice/app/routes/watcher.py (2)
sync-microservice/app/utils/watcher.py (5)
  • watcher_util_start_folder_watcher (216-271)
  • watcher_util_stop_folder_watcher (274-302)
  • watcher_util_restart_folder_watcher (305-314)
  • watcher_util_is_watcher_running (211-213)
  • watcher_util_get_watcher_info (317-328)
sync-microservice/app/schemas/watcher.py (2)
  • WatcherStatusResponse (12-19)
  • WatcherControlResponse (22-27)
sync-microservice/app/utils/watcher.py (1)
sync-microservice/app/database/folders.py (1)
  • db_get_all_folders_with_ids (19-41)
sync-microservice/app/routes/folders.py (2)
sync-microservice/app/schemas/folders.py (3)
  • FolderTaggingStatusSuccessResponse (18-23)
  • FolderTaggingStatusErrorResponse (26-36)
  • FolderTaggingInfo (5-15)
sync-microservice/app/database/folders.py (1)
  • FolderTaggingInfo (11-16)
sync-microservice/app/database/folders.py (1)
sync-microservice/app/schemas/folders.py (1)
  • FolderTaggingInfo (5-15)
backend/app/routes/folders.py (1)
backend/app/utils/images.py (2)
  • image_util_process_folder_images (20-70)
  • image_util_process_untagged_images (73-87)
backend/app/utils/images.py (3)
backend/app/database/images.py (5)
  • db_get_images_by_folder_ids (200-232)
  • db_delete_images_by_ids (235-267)
  • db_bulk_insert_images (67-90)
  • db_get_untagged_images (93-134)
  • db_update_image_tagged_status (137-163)
backend/app/models/FaceDetector.py (3)
  • FaceDetector (11-66)
  • detect_faces (22-54)
  • close (56-66)
backend/app/models/ObjectClassifier.py (2)
  • ObjectClassifier (6-30)
  • close (24-30)
🪛 LanguageTool
sync-microservice/README.md

[grammar] ~7-~7: There might be a mistake here.
Context: ...ring**: Watches all folders registered in PictoPy database - 🗄️ **Database Integr...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...olders registered in PictoPy database - 🗄️ Database Integration: Connects to ...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...?️ Database Integration: Connects to main PictoPy database to get folder infor...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ... database to get folder information - 📊 Health Monitoring: Provides health c...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...tarts watching folders on startup ## Quick Start ### Prerequisites - Python 3.8...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ... Start ### Prerequisites - Python 3.8+ - PictoPy backend database should be set u...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...nc-microservice 2. **Create and activate virtual environment:** bash p...

(QB_NEW_EN)


[grammar] ~111-~111: There might be a mistake here.
Context: ...lly: 1. On Startup: - Connects to PictoPy database - Retrieves all fol...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...itors file changes (add, modify, delete) - Logs all detected changes - Maps chan...

(QB_NEW_EN)


[grammar] ~119-~119: There might be a mistake here.
Context: ..., delete) - Logs all detected changes - Maps changes to specific folder IDs -...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ... - Maps changes to specific folder IDs - Prepares for future database sync operat...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...abase sync operations 3. On Shutdown: - Gracefully stops file watcher - Clean...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...base Operations**: Add new functions to app/database/folders.py 2. File Processing: Extend `app/utils/fil...

(QB_NEW_EN)


[grammar] ~132-~132: There might be a mistake here.
Context: ...ders.py2. **File Processing**: Extendapp/utils/file_watcher.py3. **API Endpoints**: Add routes tomain.py`...

(QB_NEW_EN)


[grammar] ~139-~139: There might be a mistake here.
Context: ...mage file detection and database updates - Thumbnail generation for new images - AI...

(QB_NEW_EN)


[grammar] ~140-~140: There might be a mistake here.
Context: ...es - Thumbnail generation for new images - AI processing trigger for new images - D...

(QB_NEW_EN)


[grammar] ~141-~141: There might be a mistake here.
Context: ...s - AI processing trigger for new images - Database cleanup for deleted files - Int...

(QB_NEW_EN)


[grammar] ~142-~142: There might be a mistake here.
Context: ...ges - Database cleanup for deleted files - Integration with main PictoPy backend AP...

(QB_NEW_EN)


[grammar] ~165-~165: There might be a mistake here.
Context: ...No Folders to Watch** - Add folders to PictoPy backend first - Check databa...

(QB_NEW_EN)


[grammar] ~169-~169: There might be a mistake here.
Context: ...in filesystem 3. Watcher Not Starting - Check folder permissions - Ensure fol...

(QB_NEW_EN)


[grammar] ~171-~171: There might be a mistake here.
Context: ...r permissions - Ensure folders exist on filesystem - Review logs for specifi...

(QB_NEW_EN)


[grammar] ~178-~178: There might be a mistake here.
Context: ...d console output for: - Startup process - Database connections - Folder discovery ...

(QB_NEW_EN)


[grammar] ~179-~179: There might be a mistake here.
Context: ...- Startup process - Database connections - Folder discovery - File change events - ...

(QB_NEW_EN)


[grammar] ~180-~180: There might be a mistake here.
Context: ... Database connections - Folder discovery - File change events - Error conditions #...

(QB_NEW_EN)


[grammar] ~181-~181: There might be a mistake here.
Context: ... - Folder discovery - File change events - Error conditions ## Integration This m...

(QB_NEW_EN)


[grammar] ~188-~188: There might be a mistake here.
Context: ...: - Port 8000: Main PictoPy backend - Port 8001: Sync microservice (this ser...

(QB_NEW_EN)


[grammar] ~189-~189: There might be a mistake here.
Context: ...8001**: Sync microservice (this service) - Shared Database: Both services use the...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
sync-microservice/README.md

82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Tauri Tests
  • GitHub Check: Backend Tests
🔇 Additional comments (19)
backend/main.py (3)

124-126: Router inclusion LGTM.

New routers are mounted with clear prefixes and tags.


95-101: Docs path verified—no action needed.

The directory docs/backend/backend_python/ and the file openapi.json already exist in the repo, so writing to that location at runtime will succeed without breaking CI or documentation publishing.


31-45: Table creation order in backend/main.py verified safe

Verified that in the lifespan function the calls occur in this sequence:

  • db_create_folders_table()
  • db_create_images_table()
  • db_create_YOLO_classes_table()
  • …followed by all other table setups.

Because db_create_YOLO_classes_table runs immediately after db_create_images_table and before any writes to the image_classes mapping, there are no foreign-key or startup-order issues. No changes needed.

backend/app/utils/folders.py (1)

98-120: Deletion helper reads cleanly; edge cases covered.

Early return on empty input and ID derivation from db_child_folders look correct. Good use of batch delete.

backend/app/database/folders.py (2)

183-194: Foreign keys pragma on delete is correct.

Enabling PRAGMA foreign_keys before deleting folders ensures cascading to images and downstream tables. Good.


237-241: Tiny readability improvement acknowledged.

Inlining the SELECT for existence check is fine; behavior unchanged.

backend/app/routes/folders.py (3)

168-176: Background submit is correct; signature alignment with type hints above.

Submitting post processing to the ProcessPool keeps the request fast. Ensure the function signature matches any type hint changes you apply.


325-348: Sync flow is solid; confirms add/delete, then batches post-sync processing.

  • Comparing filesystem vs DB using sets is clean and efficient.
  • Using batch helpers for add/delete aligns with the new multi-folder image pipeline.
  • Scheduling post_sync_folder_sequence is appropriate.

No functional issues spotted here.


121-123: Validation error path normalization is correct.

Raising a clear validation error early is good. Since later logic assumes absolute paths, you already normalize request.folder_path below (Line 139).

backend/app/utils/images.py (3)

171-190: LGTM: folder traversal and filtering are clear.

Recursive and non-recursive modes are implemented cleanly with a validity filter. Error handling on os.listdir is appropriate.


220-238: LGTM: Obsolete image cleanup is correct and removes stale thumbnails too.

The function scopes deletions to provided folder IDs, and handles thumbnail cleanup with appropriate error handling.


141-156: Review Image ID Generation & Schema Constraints

I couldn’t locate the images table definition (raw SQL or ORM model) in the repo, so please confirm your DB schema to avoid duplicate rows or type mismatches:

• Verify images.id column type (UUID/TEXT vs. INTEGER) matches downstream expectations (e.g., FaceDetector.detect_faces).
• Ensure there’s a UNIQUE constraint or index on images.path so INSERT OR IGNORE actually dedupes.

If path isn’t unique yet, choose one of:

  • Add a UNIQUE constraint on images.path and continue using random UUIDs.
  • Switch to a deterministic UUID (e.g. uuid5 on the absolute file path) so IDs remain stable across runs.

Please confirm via your migration files or ORM models that these constraints and types are correctly defined.

sync-microservice/app/schemas/__init__.py (1)

1-1: Package docstring is fine.

Clear and helpful description for the schemas package.

sync-microservice/app/schemas/health.py (1)

5-10: LGTM: clear, minimal health schema with Literals.

The field set matches the described health endpoint and should generate a tight OpenAPI schema.

sync-microservice/app/schemas/folders.py (1)

8-9: No action needed: folder_id type matches the DB layer

The DB alias FolderId is defined as str in sync-microservice/app/database/folders.py:6, so the Pydantic schema’s folder_id: str is correct.

sync-microservice/main.py (1)

5-16: Router wiring with lifespan looks good.

sync-microservice/app/routes/health.py (1)

1-17: Health endpoint implementation is clean and aligned with the schema.

sync-microservice/app/routes/folders.py (1)

33-37: folder_id is consistently a string in DB and schema

I confirmed that:

  • backend/app/database/folders.py declares folder_id as TEXT (primary key)
  • sync-microservice/app/database/folders.py’s FolderTaggingInfo NamedTuple uses folder_id: str
  • sync-microservice/app/schemas/folders.py’s Pydantic model declares folder_id: str

All layers in this service treat folder_id as a string. Please verify that the main backend’s external API also exposes folder IDs as strings—if it returns integers, consider aligning both services to use the same type.

sync-microservice/app/schemas/watcher.py (1)

5-34: Schemas look solid and consistent.

The models align with the utilities’ payloads and should serialize/validate cleanly.

Comment on lines +200 to +233
def db_get_images_by_folder_ids(folder_ids: List[int]) -> List[Tuple[ImageId, ImagePath, str]]:
"""
Get all images that belong to the specified folder IDs.

Args:
folder_ids: List of folder IDs to search for images

Returns:
List of tuples containing (image_id, image_path, thumbnail_path)
"""
if not folder_ids:
return []

conn = sqlite3.connect(DATABASE_PATH)
cursor = conn.cursor()

try:
# Create placeholders for the IN clause
placeholders = ",".join("?" for _ in folder_ids)
cursor.execute(
f"""
SELECT id, path, thumbnailPath
FROM images
WHERE folder_id IN ({placeholders})
""",
folder_ids,
)
return cursor.fetchall()
except Exception as e:
print(f"Error getting images by folder IDs: {e}")
return []
finally:
conn.close()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Type mismatch: folder_ids should be List[FolderId] (str), not List[int].

folders.folder_id is TEXT (UUID). Using List[int] misleads type checkers and readers, and increases risk of incorrect inputs.

-def db_get_images_by_folder_ids(folder_ids: List[int]) -> List[Tuple[ImageId, ImagePath, str]]:
+def db_get_images_by_folder_ids(folder_ids: List[FolderId]) -> List[Tuple[ImageId, ImagePath, str]]:

Optionally align placeholder creation style with the rest of the codebase for consistency:

-        placeholders = ",".join("?" for _ in folder_ids)
+        placeholders = ",".join(["?"] * len(folder_ids))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def db_get_images_by_folder_ids(folder_ids: List[int]) -> List[Tuple[ImageId, ImagePath, str]]:
"""
Get all images that belong to the specified folder IDs.
Args:
folder_ids: List of folder IDs to search for images
Returns:
List of tuples containing (image_id, image_path, thumbnail_path)
"""
if not folder_ids:
return []
conn = sqlite3.connect(DATABASE_PATH)
cursor = conn.cursor()
try:
# Create placeholders for the IN clause
placeholders = ",".join("?" for _ in folder_ids)
cursor.execute(
f"""
SELECT id, path, thumbnailPath
FROM images
WHERE folder_id IN ({placeholders})
""",
folder_ids,
)
return cursor.fetchall()
except Exception as e:
print(f"Error getting images by folder IDs: {e}")
return []
finally:
conn.close()
def db_get_images_by_folder_ids(folder_ids: List[FolderId]) -> List[Tuple[ImageId, ImagePath, str]]:
"""
Get all images that belong to the specified folder IDs.
Args:
folder_ids: List of folder IDs to search for images
Returns:
List of tuples containing (image_id, image_path, thumbnail_path)
"""
if not folder_ids:
return []
conn = sqlite3.connect(DATABASE_PATH)
cursor = conn.cursor()
try:
# Create placeholders for the IN clause
placeholders = ",".join(["?"] * len(folder_ids))
cursor.execute(
f"""
SELECT id, path, thumbnailPath
FROM images
WHERE folder_id IN ({placeholders})
""",
folder_ids,
)
return cursor.fetchall()
except Exception as e:
print(f"Error getting images by folder IDs: {e}")
return []
finally:
conn.close()
🤖 Prompt for AI Agents
In backend/app/database/images.py around lines 200 to 233, the function
signature and typing use List[int] for folder_ids while folder_id is stored as
TEXT/UUID; change the type annotation to List[FolderId] (alias of str) and
update any local typing imports accordingly so type checkers reflect string
UUIDs; ensure callers pass strings (or convert ints to str) and keep the SQL
logic the same (placeholders can remain generated by ",".join("?" for _ in
folder_ids) or be aligned to your project's placeholder helper for consistency).

Comment on lines +42 to 60
def post_folder_add_sequence(folder_path: str, folder_id: int):
"""
Post-addition sequence for a folder.
This function is called after a folder is successfully added.
It processes images in the folder and updates the database.
"""
try:
# Process images in the folder
image_util_process_folder_images(folder_path)
# Get all folder IDs and paths that match the root path prefix
folder_data = []
folder_ids_and_paths = db_get_folder_ids_by_path_prefix(folder_path)

# Set all folders to non-recursive (False)
for folder_id_from_db, folder_path_from_db in folder_ids_and_paths:
folder_data.append((folder_path_from_db, folder_id_from_db, False))

print("Add folder: ", folder_data)
# Process images in all folders
image_util_process_folder_images(folder_data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Type hint fix: folder_id should be str (UUID), not int. Also note: unused param.

DB uses TEXT UUIDs for folder_id. Update the annotation. The folder_id param isn’t used in this function; either remove it or use it for clarity.

-def post_folder_add_sequence(folder_path: str, folder_id: int):
+def post_folder_add_sequence(folder_path: str, folder_id: str):
@@
-        # Get all folder IDs and paths that match the root path prefix
+        # Get all folder IDs and paths that match the root path prefix

If you decide to drop the unused param, update the submit call in add_folder accordingly.


🏁 Script executed:

#!/bin/bash
# Find all references to post_folder_add_sequence to update call sites if signature changes
rg -nP '\bpost_folder_add_sequence\s*\(' -C2

Length of output: 326


🏁 Script executed:

#!/bin/bash
# Locate the add_folder route to see if post_folder_add_sequence is ever invoked there
rg -P -n -C5 'def\s+add_folder' backend/app/routes/folders.py

Length of output: 519


Update post_folder_add_sequence signature: drop unused folder_id or correct its type

  • The folder_id parameter is never used in post_folder_add_sequence and is mis-typed as int (our UUIDs are stored as str).

  • In backend/app/routes/folders.py (around line 42), update the signature:

    -def post_folder_add_sequence(folder_path: str, folder_id: int):
    +def post_folder_add_sequence(folder_path: str):

    If you actually need the folder’s UUID inside this function, keep the parameter but change its annotation:

    -def post_folder_add_sequence(folder_path: str, folder_id: int):
    +def post_folder_add_sequence(folder_path: str, folder_id: str):
  • Remove or update any downstream calls to match the new signature. (A global search shows no existing invocations in the codebase.)

🤖 Prompt for AI Agents
In backend/app/routes/folders.py around lines 42 to 60, the function
post_folder_add_sequence currently declares a folder_id: int parameter that is
unused and incorrectly typed (UUIDs are strings); either remove the folder_id
parameter from the signature and any callers, or if the UUID is needed keep the
parameter but change its type annotation to str and use it appropriately; then
run a global search to update/remove any downstream calls to match the new
signature and adjust typing/imports as needed.

Comment on lines +82 to +106
def post_sync_folder_sequence(folder_path: str, folder_id: int, added_folders: List[Tuple[str, str]]):
"""
Post-sync sequence for a folder.
This function is called after a folder is synced.
It processes images in the folder and updates the database.
"""
try:
# Create folder data array
folder_data = []

folder_data.append((folder_path, folder_id, False))

for added_folder_id, added_folder_path in added_folders:
folder_data.append((added_folder_path, added_folder_id, False))

print("Sync folder: ", folder_data)
# Process images in all folders
image_util_process_folder_images(folder_data)
image_util_process_untagged_images()
cluster_util_face_clusters_sync()
except Exception as e:
print(f"Error in post processing after folder {folder_path} was synced: {e}")
return False
return True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Type hints: use str for folder_id and ensure added_folders typing matches.

Consistently use str for folder IDs. The added_folders typing already looks correct as List[Tuple[str, str]].

-def post_sync_folder_sequence(folder_path: str, folder_id: int, added_folders: List[Tuple[str, str]]):
+def post_sync_folder_sequence(folder_path: str, folder_id: str, added_folders: List[Tuple[str, str]]):

Also consider replacing prints with the app logger for background task observability.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def post_sync_folder_sequence(folder_path: str, folder_id: int, added_folders: List[Tuple[str, str]]):
"""
Post-sync sequence for a folder.
This function is called after a folder is synced.
It processes images in the folder and updates the database.
"""
try:
# Create folder data array
folder_data = []
folder_data.append((folder_path, folder_id, False))
for added_folder_id, added_folder_path in added_folders:
folder_data.append((added_folder_path, added_folder_id, False))
print("Sync folder: ", folder_data)
# Process images in all folders
image_util_process_folder_images(folder_data)
image_util_process_untagged_images()
cluster_util_face_clusters_sync()
except Exception as e:
print(f"Error in post processing after folder {folder_path} was synced: {e}")
return False
return True
def post_sync_folder_sequence(folder_path: str, folder_id: str, added_folders: List[Tuple[str, str]]):
"""
Post-sync sequence for a folder.
This function is called after a folder is synced.
It processes images in the folder and updates the database.
"""
try:
# Create folder data array
folder_data = []
folder_data.append((folder_path, folder_id, False))
for added_folder_id, added_folder_path in added_folders:
folder_data.append((added_folder_path, added_folder_id, False))
print("Sync folder: ", folder_data)
# Process images in all folders
image_util_process_folder_images(folder_data)
image_util_process_untagged_images()
cluster_util_face_clusters_sync()
except Exception as e:
print(f"Error in post processing after folder {folder_path} was synced: {e}")
return False
return True
🤖 Prompt for AI Agents
In backend/app/routes/folders.py around lines 82 to 106, the function signature
and runtime usage should use str for folder IDs (change folder_id: int to
folder_id: str) and ensure the added_folders entries are treated as
List[Tuple[str, str]] (they already are in typing but use those types when
constructing folder_data); replace plain print calls with the application logger
(e.g., app.logger.debug/info) and replace the generic print in the except block
with logger.exception or logger.error including the exception so stack traces
are captured; keep the rest of the logic the same and return booleans as before.

Comment on lines +33 to +54
all_image_records = []
all_folder_ids = []

def image_util_generate_thumbnail(
image_path: str, thumbnail_path: str, size: Tuple[int, int] = (200, 200)
) -> bool:
"""Generate thumbnail for a single image."""
try:
with Image.open(image_path) as img:
img.thumbnail(size)
# Process each folder in the provided data
for folder_path, folder_id, recursive in folder_data:
try:
# Add folder ID to list for obsolete image cleanup
all_folder_ids.append(folder_id)

# Convert to RGB if the image has an alpha channel or is not RGB
if img.mode in ("RGBA", "P"):
img = img.convert("RGB")
# Step 1: Get all image files from current folder
image_files = image_util_get_images_from_folder(folder_path, recursive)

img.save(thumbnail_path, "JPEG") # Always save thumbnails as JPEG
return True
except Exception as e:
print(f"Error generating thumbnail for {image_path}: {e}")
return False
if not image_files:
continue # No images in this folder, continue to next

# Step 2: Create folder path mapping for this folder
folder_path_to_id = {os.path.abspath(folder_path): folder_id}

def image_util_get_images_from_folder(folder_path: str) -> List[str]:
"""Get all image files from a folder."""
image_files = []
for root, _, files in os.walk(folder_path):
for file in files:
file_path = os.path.join(root, file)
if image_util_is_valid_image(file_path):
image_files.append(file_path)
return image_files
# Step 3: Prepare image records for this folder
folder_image_records = image_util_prepare_image_records(image_files, folder_path_to_id)
all_image_records.extend(folder_image_records)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Prevent duplicate scanning/records across overlapping folders; build a global mapping and dedupe processed files.

If folder_data contains overlapping roots (e.g., a parent recursive and its child), images can be collected multiple times. Since folder_path_to_id is recreated per-iteration, the “most specific folder” mapping is also not honored across all provided folders.

Refactor to:

  • Build one global folder_path_to_id across folder_data.
  • Keep a processed_paths set to dedupe image files.
  • Use the global mapping when preparing image records.

Suggested patch:

-        all_image_records = []
-        all_folder_ids = []
+        all_image_records = []
+        all_folder_ids = []
+        # Build a global mapping once to allow "most specific" folder-ID resolution
+        folder_path_to_id = {os.path.abspath(p): fid for (p, fid, _) in folder_data}
+        processed_paths: set[str] = set()
@@
-        for folder_path, folder_id, recursive in folder_data:
+        for folder_path, folder_id, recursive in folder_data:
             try:
                 # Add folder ID to list for obsolete image cleanup
                 all_folder_ids.append(folder_id)
@@
-                # Step 2: Create folder path mapping for this folder
-                folder_path_to_id = {os.path.abspath(folder_path): folder_id}
-
-                # Step 3: Prepare image records for this folder
-                folder_image_records = image_util_prepare_image_records(image_files, folder_path_to_id)
+                # Step 2: Deduplicate overlapped scans
+                new_files = [p for p in image_files if p not in processed_paths]
+                if not new_files:
+                    continue
+                processed_paths.update(new_files)
+
+                # Step 3: Prepare image records using the global mapping
+                folder_image_records = image_util_prepare_image_records(new_files, folder_path_to_id)
                 all_image_records.extend(folder_image_records)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
all_image_records = []
all_folder_ids = []
def image_util_generate_thumbnail(
image_path: str, thumbnail_path: str, size: Tuple[int, int] = (200, 200)
) -> bool:
"""Generate thumbnail for a single image."""
try:
with Image.open(image_path) as img:
img.thumbnail(size)
# Process each folder in the provided data
for folder_path, folder_id, recursive in folder_data:
try:
# Add folder ID to list for obsolete image cleanup
all_folder_ids.append(folder_id)
# Convert to RGB if the image has an alpha channel or is not RGB
if img.mode in ("RGBA", "P"):
img = img.convert("RGB")
# Step 1: Get all image files from current folder
image_files = image_util_get_images_from_folder(folder_path, recursive)
img.save(thumbnail_path, "JPEG") # Always save thumbnails as JPEG
return True
except Exception as e:
print(f"Error generating thumbnail for {image_path}: {e}")
return False
if not image_files:
continue # No images in this folder, continue to next
# Step 2: Create folder path mapping for this folder
folder_path_to_id = {os.path.abspath(folder_path): folder_id}
def image_util_get_images_from_folder(folder_path: str) -> List[str]:
"""Get all image files from a folder."""
image_files = []
for root, _, files in os.walk(folder_path):
for file in files:
file_path = os.path.join(root, file)
if image_util_is_valid_image(file_path):
image_files.append(file_path)
return image_files
# Step 3: Prepare image records for this folder
folder_image_records = image_util_prepare_image_records(image_files, folder_path_to_id)
all_image_records.extend(folder_image_records)
all_image_records = []
all_folder_ids = []
# Build a global mapping once to allow "most specific" folder-ID resolution
folder_path_to_id = {os.path.abspath(p): fid for (p, fid, _) in folder_data}
processed_paths: set[str] = set()
# Process each folder in the provided data
for folder_path, folder_id, recursive in folder_data:
try:
# Add folder ID to list for obsolete image cleanup
all_folder_ids.append(folder_id)
# Step 1: Get all image files from current folder
image_files = image_util_get_images_from_folder(folder_path, recursive)
if not image_files:
continue # No images in this folder, continue to next
# Step 2: Deduplicate overlapped scans
new_files = [p for p in image_files if p not in processed_paths]
if not new_files:
continue
processed_paths.update(new_files)
# Step 3: Prepare image records using the global mapping
folder_image_records = image_util_prepare_image_records(new_files, folder_path_to_id)
all_image_records.extend(folder_image_records)
🤖 Prompt for AI Agents
In backend/app/utils/images.py around lines 33 to 54, the code creates a
per-iteration folder_path_to_id and can double-scan images when folder_data
contains overlapping roots; refactor to first build a single global mapping of
absolute folder paths to folder IDs (resolve overlaps by preferring the most
specific/longest path for conflicts), initialize a processed_paths set before
the loop, then for each folder call image_util_get_images_from_folder, filter
out any file paths already in processed_paths (add new ones to the set), and
finally call image_util_prepare_image_records using the global mapping (not a
per-iteration mapping) so files are deduplicated and the most specific folder_id
is used for each image.

Comment on lines +258 to +278
def image_util_find_folder_id_for_image(image_path: str, folder_path_to_id: Dict[str, int]) -> int:
"""
Find the most specific folder ID for a given image path.

Args:
image_path: Path to the image file
folder_path_to_id: Dictionary mapping folder paths to IDs

Returns:
Folder ID if found, None otherwise
"""
parent_folder = os.path.dirname(image_path)

current_path = parent_folder
while current_path:
if current_path in folder_path_to_id:
return folder_path_to_id[current_path]
current_path = os.path.dirname(current_path)

return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix root-traversal bug and incorrect return typing in image_util_find_folder_id_for_image.

  • while current_path: with os.path.dirname("/") never terminates on Unix, causing an infinite loop for images under filesystem root when there’s no mapping match.
  • The function returns None but is typed as -> int.

Apply this diff to make traversal safe and update typing/docstring accordingly:

-def image_util_find_folder_id_for_image(image_path: str, folder_path_to_id: Dict[str, int]) -> int:
+def image_util_find_folder_id_for_image(image_path: str, folder_path_to_id: Dict[str, int]) -> Optional[int]:
@@
-    Returns:
-        Folder ID if found, None otherwise
+    Returns:
+        Optional[int]: Folder ID if found, None otherwise
@@
-    parent_folder = os.path.dirname(image_path)
-
-    current_path = parent_folder
-    while current_path:
-        if current_path in folder_path_to_id:
-            return folder_path_to_id[current_path]
-        current_path = os.path.dirname(current_path)
-
-    return None
+    parent_folder = os.path.dirname(os.path.abspath(image_path))
+
+    current_path = parent_folder
+    while True:
+        if current_path in folder_path_to_id:
+            return folder_path_to_id[current_path]
+        parent = os.path.dirname(current_path)
+        if parent == current_path:
+            # Reached filesystem root
+            return None
+        current_path = parent

Additionally, add Optional to imports at the top of the file:

from typing import List, Tuple, Dict, Optional
🤖 Prompt for AI Agents
In backend/app/utils/images.py around lines 258 to 278, the function
image_util_find_folder_id_for_image can hang when traversing up from root
(os.path.dirname("/") == "/") and its signature/docstring incorrectly claim it
returns int while it can return None; update the function signature to ->
Optional[int] and adjust the docstring to state it returns Optional[int], add
Optional to the typing imports, and fix the traversal to break when current_path
is empty or when os.path.dirname(current_path) == current_path (i.e., reached
filesystem root) so the loop terminates safely; ensure the function still
returns the mapped folder_id when found and None otherwise.

Comment on lines +47 to +55
## API Endpoints

### Core Endpoints

- **`GET /`** - Service information
- **`GET /health`** - Health check with database and watcher status
- **`GET /folders`** - List all folders being watched
- **`GET /watcher/status`** - Current watcher status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify endpoints and file references across the repo to ensure README accuracy.

Check that no stray GET /folders or file_watcher.py exist, and confirm the /api/v1 prefixed routes.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for a GET /folders endpoint..."
rg -nP -C2 --type=py '@router\.(get|post|put|delete)\s*\(\s*"/folders"\s*' || echo "No direct /folders endpoint found."

echo "Searching for file_watcher.py..."
fd -H -a 'file_watcher.py' || echo "No file_watcher.py found."

echo "List defined routers and their prefixes in sync-microservice..."
rg -nP --type=py -C2 'APIRouter\(' sync-microservice

echo "List health/folder/watcher endpoints..."
rg -nP --type=py -C2 '@router\.(get|post|put|delete)\s*\(' sync-microservice/app/routes

Length of output: 3672


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Locating FastAPI app instantiation and router includes..."
rg -nP -C2 "FastAPI\(" -n sync-microservice/app
rg -nP -C2 "include_router" -n sync-microservice/app

Length of output: 194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for FastAPI app instantiation..."
rg -nP --type=py -C2 "FastAPI\(" .

echo "Searching for router inclusion (include_router)..."
rg -nP --type=py -C2 "include_router" .

Length of output: 4377


Update README API endpoints to align with actual routing

Please update sync-microservice/README.md so that the “Core Endpoints” exactly match the routes registered in main.py (and include the /api/v1 prefix):

• Prefix all entries with /api/v1
• Replace

  • GET /folders – List all folders being watched
    with
    GET /api/v1/folders/status (the only folders route defined)
    • Replace
  • GET /watcher/status
    with
    GET /api/v1/watcher/status
    • Add the watcher control endpoints you’ve exposed:
  • POST /api/v1/watcher/start
  • POST /api/v1/watcher/stop
  • POST /api/v1/watcher/restart
    • Confirm whether a root / (service info) endpoint actually exists—if not, remove it from the docs.

Mandatory corrections are needed in sync-microservice/README.md to keep docs and code in sync.

🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 47 to 55, the "Core Endpoints" list
does not match the routes in main.py; update every entry to include the /api/v1
prefix, replace "GET /folders" with "GET /api/v1/folders/status", change "GET
/watcher/status" to "GET /api/v1/watcher/status", add the watcher control
endpoints "POST /api/v1/watcher/start", "POST /api/v1/watcher/stop", and "POST
/api/v1/watcher/restart", and verify whether a root "GET /" service info
endpoint exists in main.py—if it does not, remove that entry from the README.

Comment on lines +49 to +55
### Core Endpoints

- **`GET /`** - Service information
- **`GET /health`** - Health check with database and watcher status
- **`GET /folders`** - List all folders being watched
- **`GET /watcher/status`** - Current watcher status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix endpoint paths and names to match the code (prefix /api/v1; no GET /; folders endpoint is /folders/status).

The README lists:

  • GET / (not implemented)
  • GET /health (actual: /api/v1/health)
  • GET /folders (not implemented)
  • GET /watcher/status (actual: /api/v1/watcher/status)

Align docs with implemented routes registered in main.py.

Apply this diff:

-### Core Endpoints
-
-- **`GET /`** - Service information
-- **`GET /health`** - Health check with database and watcher status
-- **`GET /folders`** - List all folders being watched
-- **`GET /watcher/status`** - Current watcher status
+### Core Endpoints (all under `/api/v1`)
+
+- **`GET /api/v1/health`** - Health check with database and watcher status
+- **`GET /api/v1/folders/status`** - Folder-wise tagging progress
+- **`GET /api/v1/watcher/status`** - Current watcher status
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Core Endpoints
- **`GET /`** - Service information
- **`GET /health`** - Health check with database and watcher status
- **`GET /folders`** - List all folders being watched
- **`GET /watcher/status`** - Current watcher status
### Core Endpoints (all under `/api/v1`)
- **`GET /api/v1/health`** - Health check with database and watcher status
- **`GET /api/v1/folders/status`** - Folder-wise tagging progress
- **`GET /api/v1/watcher/status`** - Current watcher status
🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 49 to 55, update the Core Endpoints
to match the routes registered in main.py: remove the non-implemented GET /
entry, change GET /health to GET /api/v1/health, change GET /watcher/status to
GET /api/v1/watcher/status, and replace GET /folders with GET /folders/status
(or, if implemented under the API prefix, GET /api/v1/folders/status); ensure
all endpoint paths are consistently prefixed with /api/v1 to match the code.

Comment on lines +68 to +78
**Folders List:**

```json
{
"total_folders": 3,
"folders": [
{ "id": "uuid-1", "path": "/path/to/folder1" },
{ "id": "uuid-2", "path": "/path/to/folder2" }
]
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update example response to actual API shape returned by /folders/status.

The example shows a folders list not implemented. The implemented endpoint returns a success envelope with data items containing tagging_percentage.

Apply this diff:

-**Folders List:**
+**Folder Tagging Status:**
 
 ```json
-{
-  "total_folders": 3,
-  "folders": [
-    { "id": "uuid-1", "path": "/path/to/folder1" },
-    { "id": "uuid-2", "path": "/path/to/folder2" }
-  ]
-}
+{
+  "status": "success",
+  "total_folders": 3,
+  "data": [
+    { "folder_id": "uuid-1", "folder_path": "/path/to/folder1", "tagging_percentage": 42.0 },
+    { "folder_id": "uuid-2", "folder_path": "/path/to/folder2", "tagging_percentage": 87.5 }
+  ]
+}
🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 68 to 78, the example JSON for the
/folders/status response uses an outdated shape (total_folders and folders array
with id/path); update it to the actual API envelope by wrapping with
"status":"success", keep "total_folders": 3, rename the folder entries to
"folder_id" and "folder_path", replace the "folders" array with "data", and add
the "tagging_percentage" numeric field for each item as shown in the suggested
diff.

Comment on lines +82 to +95
```
sync-microservice/
├── app/
│ ├── config/
│ │ └── settings.py # Configuration settings
│ ├── database/
│ │ ├── __init__.py
│ │ └── folders.py # Database operations for folders
│ └── utils/
│ ├── __init__.py
│ └── file_watcher.py # File watching implementation
├── main.py # FastAPI application
└── requirements.txt # Dependencies
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix architecture section (file name and markdownlint MD040).

  • The file is app/utils/watcher.py (not file_watcher.py).
  • Add a fenced code language to satisfy MD040.

Apply this diff:

-```
+```text
 sync-microservice/
 ├── app/
 │   ├── config/
 │   │   └── settings.py          # Configuration settings
 │   ├── database/
 │   │   ├── __init__.py
 │   │   └── folders.py           # Database operations for folders
-│   └── utils/
-│       ├── __init__.py
-│       └── file_watcher.py      # File watching implementation
+│   └── utils/
+│       ├── __init__.py
+│       └── watcher.py           # File watching implementation
 ├── main.py                      # FastAPI application
 └── requirements.txt             # Dependencies
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 82 to 95, the architecture section
lists the wrong utility filename and lacks a fenced code language; update the
utils entry from file_watcher.py to watcher.py and wrap the tree in a fenced
code block with a language tag (use "text") so the block starts with ```text and
ends with ```, preserving the rest of the content exactly as in the provided
diff.

Comment on lines +129 to +134
### Adding New Features

1. **Database Operations**: Add new functions to `app/database/folders.py`
2. **File Processing**: Extend `app/utils/file_watcher.py`
3. **API Endpoints**: Add routes to `main.py`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Correct development guidance to reflect the actual module layout.

Recommend extending watcher.py and adding new endpoints via routers in app/routes rather than directly in main.py.

Apply this diff:

-1. **Database Operations**: Add new functions to `app/database/folders.py`
-2. **File Processing**: Extend `app/utils/file_watcher.py`
-3. **API Endpoints**: Add routes to `main.py`
+1. **Database Operations**: Add new functions to `app/database/folders.py`
+2. **File Processing**: Extend `app/utils/watcher.py`
+3. **API Endpoints**: Create routers under `app/routes/` and include them in `main.py`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Adding New Features
1. **Database Operations**: Add new functions to `app/database/folders.py`
2. **File Processing**: Extend `app/utils/file_watcher.py`
3. **API Endpoints**: Add routes to `main.py`
### Adding New Features
1. **Database Operations**: Add new functions to `app/database/folders.py`
2. **File Processing**: Extend `app/utils/watcher.py`
3. **API Endpoints**: Create routers under `app/routes/` and include them in `main.py`
🧰 Tools
🪛 LanguageTool

[grammar] ~131-~131: There might be a mistake here.
Context: ...base Operations**: Add new functions to app/database/folders.py 2. File Processing: Extend `app/utils/fil...

(QB_NEW_EN)


[grammar] ~132-~132: There might be a mistake here.
Context: ...ders.py2. **File Processing**: Extendapp/utils/file_watcher.py3. **API Endpoints**: Add routes tomain.py`...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In sync-microservice/README.md around lines 129 to 134, the guidance is
incorrect: instead of "Extend app/utils/file_watcher.py" and "Add routes to
main.py", update the instructions to say extend the watcher module at
app/utils/watcher.py (or the actual watcher file used by the project) and add
new API endpoints by creating routers under app/routes (registering them in
main.py only via include_router), so change the README text to reflect those
exact module paths and the preferred router-based approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add secondary backend for syncing folder and image data with the OS

1 participant