feat(assets): align local API with cloud spec#12863
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an Alembic migration and SQLAlchemy columns 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/assets/api/routes.py (1)
397-406:⚠️ Potential issue | 🟠 MajorThe hash-hit upload path drops the new
idandpreview_idfields.Lines 399-406 forward
mime_type, but notspec.idorspec.preview_id, intocreate_from_hash(). A hash-only upload can therefore return a different reference id than the caller requested and lose the preview association, even though both fields were added to this endpoint in this PR.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/routes.py` around lines 397 - 406, The fast-path that reuses an existing hash calls create_from_hash(...) but omits forwarding the requested reference id and preview association; update the create_from_hash call in the hash-hit branch to pass spec.id (or id=spec.id) and spec.preview_id (or preview_id=spec.preview_id) so the created AssetReference preserves the caller-supplied id and preview link (refer to the create_from_hash call site and the spec variable).
🧹 Nitpick comments (3)
app/assets/api/schemas_in.py (1)
152-188: Consider extracting shared validators to reduce duplication.
TagsRefineQuery._split_csv_tagsand_parse_metadata_jsonare identical to the validators inListAssetsQuery(lines 69-98). Consider extracting these into standalone functions that both classes can reuse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/schemas_in.py` around lines 152 - 188, The two validators in TagsRefineQuery (methods _split_csv_tags and _parse_metadata_json) are duplicate logic already present on ListAssetsQuery; extract these shared validator functions into top-level helpers (e.g., parse_csv_tags(value) and parse_metadata_json(value)) and have both models use `@field_validator` calling those helpers (or wrap them in small classmethod adapters) so you remove duplication while keeping behavior identical; update TagsRefineQuery._split_csv_tags and _parse_metadata_json to delegate to the new helpers and remove duplicated code.app/assets/services/ingest.py (1)
363-366: Session commit inside awith create_session()block may cause issues.At line 366,
session.commit()is called inside thewith create_session()block. Then at line 368,_register_existing_assetis called, which creates its own session. If this is intentional (to ensure the mime_type update is persisted before the subsequent operation), consider adding a comment. Otherwise, the commit could be deferred to the context manager's exit.Also, note that after
session.commit(), theassetobject may be expired/detached, but it's not used again within this block, so this is safe.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/services/ingest.py` around lines 363 - 366, The explicit session.commit() inside the with create_session() block may be unnecessary or surprising because the context manager will commit/rollback on exit; either remove the explicit session.commit() after calling update_asset_hash_and_mime(session, asset_id=asset.id, mime_type=mime_type) so the commit is handled by create_session(), or if you intentionally need the update persisted before calling _register_existing_asset (which opens its own session), add a clarifying comment above session.commit() stating that the early commit is required to persist the mime_type change before _register_existing_asset is invoked; reference the functions update_asset_hash_and_mime, _register_existing_asset and the create_session context to locate the code.app/assets/services/schemas.py (1)
28-29: Type hints with# type: ignoresuppress type safety.Setting
created_at: datetime = Noneviolates the type contract. IfReferenceDatais ever constructed without explicitly passing these fields, downstream code expectingdatetimeobjects will receiveNone, potentially causingAttributeErrorat runtime.Consider using
datetime | NoneifNoneis a valid state, or provide a proper default (e.g., a sentinel or factory).Suggested fix
- created_at: datetime = None # type: ignore[assignment] - updated_at: datetime = None # type: ignore[assignment] + created_at: datetime | None = None + updated_at: datetime | None = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/services/schemas.py` around lines 28 - 29, The fields created_at and updated_at in ReferenceData are typed as datetime but defaulted to None with "# type: ignore", which suppresses type safety; update their annotations to allow None (e.g., datetime | None or Optional[datetime]) or provide a non-None default/factory so construction cannot produce an unexpected None — locate the ReferenceData dataclass/class in schemas.py and change the annotations for created_at and updated_at to nullable types or initialize them via a default factory that returns datetime.now().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/assets/api/routes.py`:
- Around line 128-130: The code builds preview_url using result.ref.preview_id
(an assets.id FK) which the /api/assets/{id}/content route expects an asset
reference id; replace the wrong identifier by resolving the preview's reference
id (e.g., use the preview object's reference id property — something like
result.ref.preview.reference_id or result.ref.preview.ref_id depending on the
model) when constructing preview_url, or if no preview reference id exists, do
not set preview_url (omit it) until a content route that accepts asset ids is
added; update the preview_url assignment to use the correct reference identifier
or perform a null-check and skip setting preview_url.
- Around line 382-395: The current idempotency shortcut uses spec.id and
get_asset_detail(...) to assume any existing reference is a valid replay;
instead, compare the incoming spec's relevant fields (e.g., content hash / file
signature, tags, metadata, preview settings) against the retrieved existing
asset before treating it as an idempotent retry. In the block using spec.id,
call whatever routine or attributes provide the existing asset's content
hash/tags/metadata/preview (from the value returned by get_asset_detail), and
only if they are equivalent to spec (and parsed.tmp_path content) call
delete_temp_file_if_exists(parsed.tmp_path) and return the existing asset via
_build_asset_response and schemas_out.AssetCreated as before; if they differ, do
not delete the new temp file and instead return an HTTP 409 Conflict indicating
the id is already used for a different asset.
In `@app/assets/services/asset_management.py`:
- Around line 115-124: The code double-updates reference.updated_at because
set_reference_preview already sets and flushes updated_at; modify the logic in
the block where preview_id is handled so that when set_reference_preview(...) is
called (i.e., preview_id is not None) you mark that the updated_at was already
handled and therefore skip the later call to
update_reference_updated_at(session, reference_id=reference_id) when touched is
true and user_metadata is None; specifically, adjust the touched/flag logic
around set_reference_preview and the final conditional to bypass
update_reference_updated_at if preview_id was applied.
In `@app/assets/services/ingest.py`:
- Line 247: The function upload_from_temp_path declares an unused parameter
asset_id; to support idempotent creation, use asset_id inside
upload_from_temp_path to look up an existing asset (e.g., query your asset
store/ORM by id) and if found return or reuse that asset instead of creating a
duplicate, otherwise proceed to create and persist a new asset using the
provided asset_id as the identifier; update any creation/return paths in
upload_from_temp_path to honor the existing asset when asset_id is provided.
---
Outside diff comments:
In `@app/assets/api/routes.py`:
- Around line 397-406: The fast-path that reuses an existing hash calls
create_from_hash(...) but omits forwarding the requested reference id and
preview association; update the create_from_hash call in the hash-hit branch to
pass spec.id (or id=spec.id) and spec.preview_id (or preview_id=spec.preview_id)
so the created AssetReference preserves the caller-supplied id and preview link
(refer to the create_from_hash call site and the spec variable).
---
Nitpick comments:
In `@app/assets/api/schemas_in.py`:
- Around line 152-188: The two validators in TagsRefineQuery (methods
_split_csv_tags and _parse_metadata_json) are duplicate logic already present on
ListAssetsQuery; extract these shared validator functions into top-level helpers
(e.g., parse_csv_tags(value) and parse_metadata_json(value)) and have both
models use `@field_validator` calling those helpers (or wrap them in small
classmethod adapters) so you remove duplication while keeping behavior
identical; update TagsRefineQuery._split_csv_tags and _parse_metadata_json to
delegate to the new helpers and remove duplicated code.
In `@app/assets/services/ingest.py`:
- Around line 363-366: The explicit session.commit() inside the with
create_session() block may be unnecessary or surprising because the context
manager will commit/rollback on exit; either remove the explicit
session.commit() after calling update_asset_hash_and_mime(session,
asset_id=asset.id, mime_type=mime_type) so the commit is handled by
create_session(), or if you intentionally need the update persisted before
calling _register_existing_asset (which opens its own session), add a clarifying
comment above session.commit() stating that the early commit is required to
persist the mime_type change before _register_existing_asset is invoked;
reference the functions update_asset_hash_and_mime, _register_existing_asset and
the create_session context to locate the code.
In `@app/assets/services/schemas.py`:
- Around line 28-29: The fields created_at and updated_at in ReferenceData are
typed as datetime but defaulted to None with "# type: ignore", which suppresses
type safety; update their annotations to allow None (e.g., datetime | None or
Optional[datetime]) or provide a non-None default/factory so construction cannot
produce an unexpected None — locate the ReferenceData dataclass/class in
schemas.py and change the annotations for created_at and updated_at to nullable
types or initialize them via a default factory that returns datetime.now().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 025fa2e3-46ee-439c-8dcf-93ed48af44b2
📒 Files selected for processing (13)
alembic_db/versions/0003_add_metadata_prompt.pyapp/assets/api/routes.pyapp/assets/api/schemas_in.pyapp/assets/api/schemas_out.pyapp/assets/api/upload.pyapp/assets/database/models.pyapp/assets/database/queries/__init__.pyapp/assets/database/queries/tags.pyapp/assets/services/asset_management.pyapp/assets/services/ingest.pyapp/assets/services/schemas.pyapp/assets/services/tagging.pytests-unit/assets_test/test_tags_api.py
b5b5acf to
7c213bd
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
app/assets/services/tagging.py (1)
81-98: Consider adding limit validation for consistency.
list_tags(line 64-65) clampslimittomax(1, min(1000, limit)), butlist_tag_histogrampasseslimitdirectly without validation. For consistency and to prevent unbounded queries, consider adding similar bounds.♻️ Proposed fix
def list_tag_histogram( owner_id: str = "", include_tags: Sequence[str] | None = None, exclude_tags: Sequence[str] | None = None, name_contains: str | None = None, metadata_filter: dict | None = None, limit: int = 100, ) -> dict[str, int]: + limit = max(1, min(1000, limit)) with create_session() as session: return list_tag_counts_for_filtered_assets(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/services/tagging.py` around lines 81 - 98, The function list_tag_histogram currently forwards the limit argument directly to list_tag_counts_for_filtered_assets without validation; mirror the behavior in list_tags by clamping limit to a safe range (e.g., limit = max(1, min(1000, limit))) before calling list_tag_counts_for_filtered_assets so unbounded queries are prevented and behavior is consistent with list_tags.app/assets/api/schemas_in.py (1)
152-187: Consider extracting shared validators to reduce duplication.
TagsRefineQueryduplicates_split_csv_tagsand_parse_metadata_jsonvalidators fromListAssetsQuery. Consider extracting these to module-level helper functions and reusing them.♻️ Proposed refactor
# Module-level helpers def split_csv_tags(v): if v is None: return [] if isinstance(v, str): return [t.strip() for t in v.split(",") if t.strip()] if isinstance(v, list): out: list[str] = [] for item in v: if isinstance(item, str): out.extend([t.strip() for t in item.split(",") if t.strip()]) return out return v def parse_metadata_json(v): if v is None or isinstance(v, dict): return v if isinstance(v, str) and v.strip(): try: parsed = json.loads(v) except Exception as e: raise ValueError(f"metadata_filter must be JSON: {e}") from e if not isinstance(parsed, dict): raise ValueError("metadata_filter must be a JSON object") return parsed return None # Then in classes: class ListAssetsQuery(BaseModel): `@field_validator`("include_tags", "exclude_tags", mode="before") `@classmethod` def _split_csv_tags(cls, v): return split_csv_tags(v)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/schemas_in.py` around lines 152 - 187, Extract the duplicate validator logic into two module-level helpers (e.g., split_csv_tags and parse_metadata_json) and have the TagsRefineQuery validators delegate to them: replace the body of TagsRefineQuery._split_csv_tags and _parse_metadata_json with simple calls that return split_csv_tags(v) and parse_metadata_json(v) respectively (do the same in ListAssetsQuery where those validators are duplicated), keeping the same validation behavior and error messages; ensure the helpers live at module scope and are referenced by both classes and that json is available in the module.app/assets/services/ingest.py (1)
364-366: Session commit increate_from_hashmay be unnecessary given subsequent call.The
session.commit()at line 366 commits the mime_type update, but immediately after,_register_existing_assetis called which opens its own session. This is correct but note that if_register_existing_assetfails, the mime_type update will have already been committed.Consider whether atomic behavior across both operations is desired.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/services/ingest.py` around lines 364 - 366, The commit after calling update_asset_hash_and_mime inside create_from_hash causes the MIME update to be persisted before calling _register_existing_asset, risking partial commits if _register_existing_asset fails; either remove the early session.commit() and let the outer transaction commit both changes together, or perform both operations in the same transactional context (e.g., pass the same session into _register_existing_asset or wrap both update_asset_hash_and_mime and _register_existing_asset in a single transaction/try/except that rolls back on failure) so that update_asset_hash_and_mime, create_from_hash, and _register_existing_asset succeed or fail atomically.app/assets/database/queries/tags.py (1)
17-20: Consider extracting shared filter helpers to a common module.Importing private functions (
_apply_metadata_filter,_apply_tag_filters) fromasset_reference.pycreates cross-module coupling on internal implementation details. If these helpers are used by multiple query modules, consider moving them to a shared location (e.g.,common.py) and making them public.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/database/queries/tags.py` around lines 17 - 20, The file currently imports private helpers _apply_metadata_filter and _apply_tag_filters from asset_reference.py which couples modules to internal details; refactor by moving these functions into a new shared module (e.g., common.py or queries/common.py), rename them to public names (apply_metadata_filter, apply_tag_filters), update their definitions and all callers (including functions in asset_reference.py and tags.py) to import the new public symbols, and remove the private imports from asset_reference.py so other query modules use the shared public helpers instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/assets/api/schemas_in.py`:
- Line 124: The tags field currently uses conflicting settings: tags: list[str]
= Field(default_factory=list, min_length=1). Decide intent and fix accordingly:
if tags must be provided (non-empty and required) remove the default_factory and
use Field(..., min_length=1) so Pydantic enforces presence and non-empty list;
if tags may be omitted or empty, keep default_factory=list but remove
min_length=1 so an empty list is valid. Update the Field call on the tags
declaration to reflect the chosen option.
In `@app/assets/services/schemas.py`:
- Around line 28-29: The fields created_at and updated_at are annotated as
datetime but default to None (with type: ignore); either make their types
nullable or ensure they are always set: change the annotations to "datetime |
None" (or "Optional[datetime]" and import typing.Optional) and keep the default
None (remove the type: ignore), or if they must never be None, remove the None
default and use dataclasses.field(default_factory=datetime.now) (or set them
explicitly where the dataclass is constructed) so the annotations and defaults
align with the values returned by created_at/updated_at.
---
Nitpick comments:
In `@app/assets/api/schemas_in.py`:
- Around line 152-187: Extract the duplicate validator logic into two
module-level helpers (e.g., split_csv_tags and parse_metadata_json) and have the
TagsRefineQuery validators delegate to them: replace the body of
TagsRefineQuery._split_csv_tags and _parse_metadata_json with simple calls that
return split_csv_tags(v) and parse_metadata_json(v) respectively (do the same in
ListAssetsQuery where those validators are duplicated), keeping the same
validation behavior and error messages; ensure the helpers live at module scope
and are referenced by both classes and that json is available in the module.
In `@app/assets/database/queries/tags.py`:
- Around line 17-20: The file currently imports private helpers
_apply_metadata_filter and _apply_tag_filters from asset_reference.py which
couples modules to internal details; refactor by moving these functions into a
new shared module (e.g., common.py or queries/common.py), rename them to public
names (apply_metadata_filter, apply_tag_filters), update their definitions and
all callers (including functions in asset_reference.py and tags.py) to import
the new public symbols, and remove the private imports from asset_reference.py
so other query modules use the shared public helpers instead.
In `@app/assets/services/ingest.py`:
- Around line 364-366: The commit after calling update_asset_hash_and_mime
inside create_from_hash causes the MIME update to be persisted before calling
_register_existing_asset, risking partial commits if _register_existing_asset
fails; either remove the early session.commit() and let the outer transaction
commit both changes together, or perform both operations in the same
transactional context (e.g., pass the same session into _register_existing_asset
or wrap both update_asset_hash_and_mime and _register_existing_asset in a single
transaction/try/except that rolls back on failure) so that
update_asset_hash_and_mime, create_from_hash, and _register_existing_asset
succeed or fail atomically.
In `@app/assets/services/tagging.py`:
- Around line 81-98: The function list_tag_histogram currently forwards the
limit argument directly to list_tag_counts_for_filtered_assets without
validation; mirror the behavior in list_tags by clamping limit to a safe range
(e.g., limit = max(1, min(1000, limit))) before calling
list_tag_counts_for_filtered_assets so unbounded queries are prevented and
behavior is consistent with list_tags.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a85291f4-1b7c-4ace-b2de-742a84346b27
📒 Files selected for processing (13)
alembic_db/versions/0003_add_metadata_prompt.pyapp/assets/api/routes.pyapp/assets/api/schemas_in.pyapp/assets/api/schemas_out.pyapp/assets/api/upload.pyapp/assets/database/models.pyapp/assets/database/queries/__init__.pyapp/assets/database/queries/tags.pyapp/assets/services/asset_management.pyapp/assets/services/ingest.pyapp/assets/services/schemas.pyapp/assets/services/tagging.pytests-unit/assets_test/test_tags_api.py
🚧 Files skipped from review as they are similar to previous changes (1)
- alembic_db/versions/0003_add_metadata_prompt.py
7c213bd to
d115a4d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/assets/database/queries/asset_reference.py (2)
34-34:⚠️ Potential issue | 🟡 MinorRemove unused import
normalize_tags.The pipeline linter flagged this as unused (F401). Tag normalization is now handled internally by the shared
apply_tag_filtershelper.🧹 Proposed fix
-from app.assets.helpers import escape_sql_like_string, get_utc_now, normalize_tags +from app.assets.helpers import escape_sql_like_string, get_utc_now🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/database/queries/asset_reference.py` at line 34, The import list in asset_reference.py includes an unused symbol normalize_tags which triggers linter F401; remove normalize_tags from the from-import "from app.assets.helpers import escape_sql_like_string, get_utc_now, normalize_tags" so the line only imports the actually used helpers (escape_sql_like_string, get_utc_now) and ensure no other references to normalize_tags remain in functions within this module (e.g., any code calling normalize_tags should be switched to use apply_tag_filters where applicable).
13-13:⚠️ Potential issue | 🟡 MinorRemove unused import
exists.The pipeline linter flagged this as unused (F401). The
existsimport is no longer needed since the tag/metadata filtering logic now delegates to the shared helpers incommon.py.🧹 Proposed fix
-from sqlalchemy import delete, exists, select +from sqlalchemy import delete, select🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/database/queries/asset_reference.py` at line 13, Remove the unused `exists` import from the top-level import statement in asset_reference.py: update the import line that currently reads "from sqlalchemy import delete, exists, select" to no longer include `exists` so it becomes "from sqlalchemy import delete, select", since tag/metadata filtering now uses shared helpers in common.py.
🧹 Nitpick comments (2)
app/assets/services/tagging.py (1)
81-98: Consider clampinglimitfor consistency.The
list_tagsfunction (lines 64-65) clampslimittomax(1, min(1000, limit)). For consistency and to prevent unbounded queries, consider applying the same clamping tolist_tag_histogram.♻️ Suggested improvement
def list_tag_histogram( owner_id: str = "", include_tags: Sequence[str] | None = None, exclude_tags: Sequence[str] | None = None, name_contains: str | None = None, metadata_filter: dict | None = None, limit: int = 100, ) -> dict[str, int]: + limit = max(1, min(1000, limit)) with create_session() as session: return list_tag_counts_for_filtered_assets(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/services/tagging.py` around lines 81 - 98, The list_tag_histogram function currently forwards the raw limit to list_tag_counts_for_filtered_assets; clamp limit the same way as list_tags to avoid unbounded queries by computing limit = max(1, min(1000, limit)) before calling list_tag_counts_for_filtered_assets. Update list_tag_histogram to perform this clamping (use the same logic as in list_tags) and then pass the clamped limit into list_tag_counts_for_filtered_assets.app/assets/api/schemas_in.py (1)
152-187: Consider extracting shared validators to reduce duplication.
TagsRefineQueryduplicates the_split_csv_tagsand_parse_metadata_jsonvalidators fromListAssetsQuery. While functional, consider extracting these as standalone functions to reduce maintenance burden.♻️ Optional refactor to share validators
# At module level: def split_csv_tags(v): if v is None: return [] if isinstance(v, str): return [t.strip() for t in v.split(",") if t.strip()] if isinstance(v, list): out: list[str] = [] for item in v: if isinstance(item, str): out.extend([t.strip() for t in item.split(",") if t.strip()]) return out return v def parse_metadata_json(v): if v is None or isinstance(v, dict): return v if isinstance(v, str) and v.strip(): try: parsed = json.loads(v) except Exception as e: raise ValueError(f"metadata_filter must be JSON: {e}") from e if not isinstance(parsed, dict): raise ValueError("metadata_filter must be a JSON object") return parsed return None # Then in both classes: `@field_validator`("include_tags", "exclude_tags", mode="before") `@classmethod` def _split_csv_tags(cls, v): return split_csv_tags(v)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/schemas_in.py` around lines 152 - 187, Extract the duplicated validators into shared module-level functions (e.g., split_csv_tags and parse_metadata_json) and update TagsRefineQuery's `@field_validator` methods (_split_csv_tags and _parse_metadata_json) to call those functions instead of duplicating logic; also update ListAssetsQuery to use the same helpers so both classes reference the single implementations and eliminate the repeated code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/assets/api/schemas_in.py`:
- Around line 123-126: The tags Field currently contradicts itself: it uses
default_factory=list (producing an empty list) while also specifying
min_length=1, which will cause validation failures when tags is omitted; update
the tags declaration (the Field for tags in this schema, e.g.,
CreateFromHashBody or the corresponding pydantic model) to be consistent by
either removing default_factory=list if tags are required (so callers must
supply at least one tag) or removing min_length=1 if empty/omitted tags are
allowed; adjust only the tags Field (and any related docstring/comments) so the
default behavior matches the intended requirement.
---
Outside diff comments:
In `@app/assets/database/queries/asset_reference.py`:
- Line 34: The import list in asset_reference.py includes an unused symbol
normalize_tags which triggers linter F401; remove normalize_tags from the
from-import "from app.assets.helpers import escape_sql_like_string, get_utc_now,
normalize_tags" so the line only imports the actually used helpers
(escape_sql_like_string, get_utc_now) and ensure no other references to
normalize_tags remain in functions within this module (e.g., any code calling
normalize_tags should be switched to use apply_tag_filters where applicable).
- Line 13: Remove the unused `exists` import from the top-level import statement
in asset_reference.py: update the import line that currently reads "from
sqlalchemy import delete, exists, select" to no longer include `exists` so it
becomes "from sqlalchemy import delete, select", since tag/metadata filtering
now uses shared helpers in common.py.
---
Nitpick comments:
In `@app/assets/api/schemas_in.py`:
- Around line 152-187: Extract the duplicated validators into shared
module-level functions (e.g., split_csv_tags and parse_metadata_json) and update
TagsRefineQuery's `@field_validator` methods (_split_csv_tags and
_parse_metadata_json) to call those functions instead of duplicating logic; also
update ListAssetsQuery to use the same helpers so both classes reference the
single implementations and eliminate the repeated code.
In `@app/assets/services/tagging.py`:
- Around line 81-98: The list_tag_histogram function currently forwards the raw
limit to list_tag_counts_for_filtered_assets; clamp limit the same way as
list_tags to avoid unbounded queries by computing limit = max(1, min(1000,
limit)) before calling list_tag_counts_for_filtered_assets. Update
list_tag_histogram to perform this clamping (use the same logic as in list_tags)
and then pass the clamped limit into list_tag_counts_for_filtered_assets.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 36d62780-06c5-44be-81ef-7b7e8dae38c8
📒 Files selected for processing (15)
alembic_db/versions/0003_add_metadata_prompt.pyapp/assets/api/routes.pyapp/assets/api/schemas_in.pyapp/assets/api/schemas_out.pyapp/assets/api/upload.pyapp/assets/database/models.pyapp/assets/database/queries/__init__.pyapp/assets/database/queries/asset_reference.pyapp/assets/database/queries/common.pyapp/assets/database/queries/tags.pyapp/assets/services/asset_management.pyapp/assets/services/ingest.pyapp/assets/services/schemas.pyapp/assets/services/tagging.pytests-unit/assets_test/test_tags_api.py
🚧 Files skipped from review as they are similar to previous changes (4)
- app/assets/services/asset_management.py
- alembic_db/versions/0003_add_metadata_prompt.py
- app/assets/database/models.py
- app/assets/services/schemas.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
app/assets/api/routes.py (1)
127-148:⚠️ Potential issue | 🟠 Major
preview_urlis built from the wrong identifier.Line 131 uses
result.ref.preview_id(which is anassets.idFK) to build the URL/api/assets/{preview_id}/content, but the/api/assets/{id}/contentroute at line 237 expects a reference id, not an asset id. This will produce dead links for assets with previews.You'll need to either:
- Resolve the preview asset's reference id to build the URL, or
- Add a content route that accepts asset ids, or
- Omit
preview_urluntil the proper resolution is implemented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/routes.py` around lines 127 - 148, The preview_url in _build_asset_response is built from result.ref.preview_id (an asset.id) but the /api/assets/{id}/content route expects a reference id, so the generated links will be invalid; fix by either resolving the preview's reference id before constructing preview_url (e.g., fetch the preview's reference id from the service or use a field like preview_ref_id on the result if available) and use that id in the URL, or if you cannot resolve it here, stop setting preview_url (set it to None) until the correct reference id resolution is implemented; update only the _build_asset_response logic (the use of result.ref.preview_id and preview_url construction) accordingly.
🧹 Nitpick comments (2)
app/assets/api/schemas_in.py (1)
152-188: Consider extracting shared validators.The
_split_csv_tagsand_parse_metadata_jsonvalidators are duplicated fromListAssetsQuery(lines 69-98). You could extract them as standalone functions and reuse them in both classes to reduce duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/schemas_in.py` around lines 152 - 188, The validators _split_csv_tags and _parse_metadata_json are duplicated between TagsRefineQuery and ListAssetsQuery; extract them as shared utility functions (e.g., parse_csv_tags(value) and parse_metadata_json(value)) in a utilities module and update both classes to call these functions from their `@field_validator` methods (replace the bodies of TagsRefineQuery._split_csv_tags and ._parse_metadata_json to delegate to the new helpers). Ensure the helper functions preserve the same behavior and error types so callers like TagsRefineQuery and ListAssetsQuery continue to validate identically.app/assets/api/routes.py (1)
383-404: Hash validation added, but consider documenting idempotency semantics.The hash mismatch validation (lines 391-397) addresses the critical case of content mismatch. However, if a client retries with the same
spec.idbut different tags, metadata, or preview_id, the endpoint returns the original asset without surfacing the discrepancy.If this is intentional (idempotent by ID + content hash only), consider documenting this behavior in the endpoint docstring. If stricter idempotency is desired, you'd need to compare all fields.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/routes.py` around lines 383 - 404, The endpoint currently treats requests with the same spec.id idempotently by only validating content hash (see spec.id, get_asset_detail, spec.hash, existing.asset.hash) and returning the original asset via _build_asset_response and schemas_out.AssetCreated without surfacing differences in tags/metadata/preview_id; update the endpoint to explicitly document this idempotency behavior in the function docstring (state that idempotency is by ID + content hash only and other fields are ignored) or, if stricter semantics are required, implement an additional comparison after fetching existing (compare tags/metadata/preview_id fields on existing vs spec) and return a 409 (or merge/update) when mismatches are found instead of returning the original asset.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@app/assets/api/routes.py`:
- Around line 127-148: The preview_url in _build_asset_response is built from
result.ref.preview_id (an asset.id) but the /api/assets/{id}/content route
expects a reference id, so the generated links will be invalid; fix by either
resolving the preview's reference id before constructing preview_url (e.g.,
fetch the preview's reference id from the service or use a field like
preview_ref_id on the result if available) and use that id in the URL, or if you
cannot resolve it here, stop setting preview_url (set it to None) until the
correct reference id resolution is implemented; update only the
_build_asset_response logic (the use of result.ref.preview_id and preview_url
construction) accordingly.
---
Nitpick comments:
In `@app/assets/api/routes.py`:
- Around line 383-404: The endpoint currently treats requests with the same
spec.id idempotently by only validating content hash (see spec.id,
get_asset_detail, spec.hash, existing.asset.hash) and returning the original
asset via _build_asset_response and schemas_out.AssetCreated without surfacing
differences in tags/metadata/preview_id; update the endpoint to explicitly
document this idempotency behavior in the function docstring (state that
idempotency is by ID + content hash only and other fields are ignored) or, if
stricter semantics are required, implement an additional comparison after
fetching existing (compare tags/metadata/preview_id fields on existing vs spec)
and return a 409 (or merge/update) when mismatches are found instead of
returning the original asset.
In `@app/assets/api/schemas_in.py`:
- Around line 152-188: The validators _split_csv_tags and _parse_metadata_json
are duplicated between TagsRefineQuery and ListAssetsQuery; extract them as
shared utility functions (e.g., parse_csv_tags(value) and
parse_metadata_json(value)) in a utilities module and update both classes to
call these functions from their `@field_validator` methods (replace the bodies of
TagsRefineQuery._split_csv_tags and ._parse_metadata_json to delegate to the new
helpers). Ensure the helper functions preserve the same behavior and error types
so callers like TagsRefineQuery and ListAssetsQuery continue to validate
identically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2b842e43-066d-4d9d-a1ca-0c72fa563d24
📒 Files selected for processing (4)
app/assets/api/routes.pyapp/assets/api/schemas_in.pyapp/assets/api/schemas_out.pytests-unit/assets_test/services/test_tag_histogram.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/assets/api/routes.py (1)
430-439:⚠️ Potential issue | 🟠 MajorThe hash-dedupe path drops the new
idandpreview_idfields.Line 432 calls
create_from_hashwithout forwardingspec.idorspec.preview_id. When the content already exists, the upload endpoint will ignore the caller-supplied reference id and preview link, which breaks deterministic ids on the dedupe path and makes the next retry miss the idempotency shortcut.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/routes.py` around lines 430 - 439, The dedupe branch calling create_from_hash does not forward the caller-supplied spec.id and spec.preview_id, causing dropped deterministic ids; update the call in the upload handler so create_from_hash receives id=spec.id and preview_id=spec.preview_id (and if create_from_hash signature lacks those parameters, add them and propagate them through its implementation so the created AssetReference preserves the provided id and preview_id).
♻️ Duplicate comments (1)
app/assets/api/routes.py (1)
407-428:⚠️ Potential issue | 🟠 MajorOnly short-circuit on exact idempotent replays.
Line 413 currently turns any existing reference id into a
200, and Line 422 deletes the freshly uploaded temp file even whenname,tags,user_metadata,mime_type, orpreview_iddiffer. That makes conflicting creates look successful and can discard the caller’s new upload; this branch needs a full-equivalence check or a409instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/assets/api/routes.py` around lines 407 - 428, The current idempotent-create branch short-circuits to 200 for any existing reference id and deletes the uploaded temp file even when fields differ; update the logic in the spec.id handling (the get_asset_detail / existing branch) to perform a full-equivalence check of the incoming spec versus the existing asset (compare name, tags, user_metadata, mime_type, preview_id, and hash if provided) and only treat it as an idempotent replay when all fields match exactly—if any field differs, do not delete the uploaded temp file and return a 409 conflict (e.g., "ID_CONFLICT" or similar); keep the current behavior of returning the existing asset and deleting the temp file only when the equivalence check passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@app/assets/api/routes.py`:
- Around line 430-439: The dedupe branch calling create_from_hash does not
forward the caller-supplied spec.id and spec.preview_id, causing dropped
deterministic ids; update the call in the upload handler so create_from_hash
receives id=spec.id and preview_id=spec.preview_id (and if create_from_hash
signature lacks those parameters, add them and propagate them through its
implementation so the created AssetReference preserves the provided id and
preview_id).
---
Duplicate comments:
In `@app/assets/api/routes.py`:
- Around line 407-428: The current idempotent-create branch short-circuits to
200 for any existing reference id and deletes the uploaded temp file even when
fields differ; update the logic in the spec.id handling (the get_asset_detail /
existing branch) to perform a full-equivalence check of the incoming spec versus
the existing asset (compare name, tags, user_metadata, mime_type, preview_id,
and hash if provided) and only treat it as an idempotent replay when all fields
match exactly—if any field differs, do not delete the uploaded temp file and
return a 409 conflict (e.g., "ID_CONFLICT" or similar); keep the current
behavior of returning the existing asset and deleting the temp file only when
the equivalence check passes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6a82c0d4-d0b6-47ee-97d8-17bc2f27f5e3
📒 Files selected for processing (1)
app/assets/api/routes.py
caa1f12 to
dc8cd3a
Compare
8ee4f1a to
a8d524a
Compare
Unify response models, add missing fields, and align input schemas with the cloud OpenAPI spec at cloud.comfy.org/openapi. - Replace AssetSummary/AssetDetail/AssetUpdated with single Asset model - Add is_immutable, metadata (system_metadata), prompt_id fields - Support mime_type and preview_id in update endpoint - Make CreateFromHashBody.name optional, add mime_type, require >=1 tag - Add id/mime_type/preview_id to upload, relax tags to optional - Rename total_tags → tags in tag add/remove responses - Add GET /api/assets/tags/refine histogram endpoint - Add DB migration for system_metadata and prompt_id columns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019ce023-3384-7560-bacf-de40b0de0dd2 Co-authored-by: Amp <amp@ampcode.com>
Extract resolve_hash_to_path() into asset_management.py and remove _resolve_blake3_to_path from server.py. Also revert loopback origin check to original logic. Amp-Thread-ID: https://ampcode.com/threads/T-019ce023-3384-7560-bacf-de40b0de0dd2 Co-authored-by: Amp <amp@ampcode.com>
Enforce non-empty tags at the Pydantic validation layer so uploads with no tags are rejected with a 400 before reaching ingest. Adds test_upload_empty_tags_rejected to cover this case. Amp-Thread-ID: https://ampcode.com/threads/T-019ce377-8bde-7048-bc28-a9df063409f9 Co-authored-by: Amp <amp@ampcode.com>
Filter asset references by owner visibility so the /view endpoint only resolves hashes for assets the requesting user can access. Adds table-driven tests for owner visibility cases. Amp-Thread-ID: https://ampcode.com/threads/T-019ce377-8bde-7048-bc28-a9df063409f9 Co-authored-by: Amp <amp@ampcode.com>
Remove None defaults and type: ignore comments. Move fields before optional fields to satisfy dataclass ordering. Amp-Thread-ID: https://ampcode.com/threads/T-019ce377-8bde-7048-bc28-a9df063409f9 Co-authored-by: Amp <amp@ampcode.com>
Move mime_type update into _register_existing_asset so it shares a single transaction with reference creation. Log a warning when the hash is not found instead of silently returning None. Amp-Thread-ID: https://ampcode.com/threads/T-019ce377-8bde-7048-bc28-a9df063409f9 Co-authored-by: Amp <amp@ampcode.com>
Align with get/update/list endpoints for consistent JSON output. Amp-Thread-ID: https://ampcode.com/threads/T-019ce377-8bde-7048-bc28-a9df063409f9 Co-authored-by: Amp <amp@ampcode.com>
Clients receive preview_id in API responses but could not dereference it through public routes (which use reference IDs). Now preview_id is a self-referential FK to asset_references.id so the value is directly usable in the public API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
list_references_by_asset_id and list_tags_with_usage were not filtering out deleted_at/is_missing refs, allowing /view?filename=blake3:... to serve files through hidden references and inflating tag usage counts. Add list_all_file_paths_by_asset_id for orphan cleanup which intentionally needs unfiltered access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The duplicate-content upload path and hash-based creation paths were silently dropping preview_id and mime_type. This wires both fields through _register_existing_asset, create_from_hash, and all route call sites so behavior is consistent regardless of whether the asset content already exists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `id` field on UploadAssetSpec was advertised for idempotent creation but never actually honored when creating new references. Remove it rather than implementing the feature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents cross-tenant metadata mutation when multiple references share the same content-addressed Asset row. mime_type can now only be set when NULL (first ingest); subsequent attempts to change it are silently ignored. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /view endpoint was discarding the content_type computed by resolve_hash_to_path() and re-guessing from the filename, which produced wrong results for extensionless files or mismatched extensions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract rebuild_metadata_projection() to build AssetReferenceMeta rows
from {**system_metadata, **user_metadata}, so system-generated metadata
is queryable via metadata_filter and user keys override system keys.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reject 'id' field in multipart upload with 400 UNSUPPORTED_FIELD instead of silently ignoring it - Build preview URL from the preview asset's own metadata rather than the parent asset's - Rename 'tags' to 'total_tags' in TagsAdd/TagsRemove response schemas for clarity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add naming_convention to Base.metadata so Alembic batch-mode reflection can match unnamed FK constraints created by migration 0002. Pass naming_convention and render_as_batch=True through env.py online config. Add migration roundtrip tests (upgrade/downgrade/cycle from baseline). Amp-Thread-ID: https://ampcode.com/threads/T-019ce466-1683-7471-b6e1-bb078223cda0 Co-authored-by: Amp <amp@ampcode.com>
…otal_tags field - Allow is_missing=True references to be counted in list_tags_with_usage when the tag is 'missing', so the missing tag count reflects all references that have been tagged as missing - Add update_is_missing_by_asset_id query helper for bulk updates by asset - Update test_add_and_remove_tags to use 'total_tags' matching the API schema Amp-Thread-ID: https://ampcode.com/threads/T-019ce482-05e7-7324-a1b0-a56a929cc7ef Co-authored-by: Amp <amp@ampcode.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename the column in the DB model, migration, and service schemas. The API response emits both job_id and prompt_id (deprecated alias) for backward compatibility with the cloud API. Amp-Thread-ID: https://ampcode.com/threads/T-019cef41-60b0-752a-aa3c-ed7f20fda2f7 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019cef45-a4d2-7548-86d2-d46bcd3db419 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019cef49-f94e-7348-bf23-9a19ebf65e0d Co-authored-by: Amp <amp@ampcode.com>
…n write - convert_metadata_to_rows returns [] for None values instead of an all-null row - Remove dead None branch from _scalar_to_row - Simplify null filter in common.py to just check for row absence - Add CHECK constraint ck_asset_reference_meta_has_value to model and migration 0003 Amp-Thread-ID: https://ampcode.com/threads/T-019cef4e-5240-7749-bb25-1f17fcf9c09c Co-authored-by: Amp <amp@ampcode.com>
register_file_in_place guarantees a non-None asset, so the 'if result.asset else None' checks were unreachable. Amp-Thread-ID: https://ampcode.com/threads/T-019cef5b-4cf8-723c-8a98-8fb8f333c133 Co-authored-by: Amp <amp@ampcode.com>
Clients can no longer modify mime_type after asset creation via the
PUT /api/assets/{id} endpoint. This reduces the risk of mime_type
spoofing. The internal update_asset_hash_and_mime function remains
available for server-side use (e.g., enrichment).
Amp-Thread-ID: https://ampcode.com/threads/T-019cef5d-8d61-75cc-a1c6-2841ac395648
Co-authored-by: Amp <amp@ampcode.com>
…ata lists - Use fully-rendered constraint names in migration 0003 to avoid the naming convention doubling the ck_ prefix on batch operations. - Add table_args to downgrade so SQLite batch mode can find the CHECK constraint (not exposed by SQLite reflection). - Fix model CheckConstraint name to use bare 'has_value' (convention auto-prefixes). - Skip None items when converting metadata lists to rows, preventing all-NULL rows that violate the has_value check constraint. Amp-Thread-ID: https://ampcode.com/threads/T-019cef87-94f9-7172-a6af-c6282290ce4f Co-authored-by: Amp <amp@ampcode.com>
63bae49 to
b6e0df3
Compare
Summary
AssetSummary,AssetDetail,AssetUpdatedwith a single unifiedAssetresponse model and a_build_asset_responsehelper used by all handlersis_immutable,metadata(fromsystem_metadataDB column),prompt_id,preview_urlfields to asset responses; add DB migration for new columnsmime_typeandpreview_idin update endpoint; makeCreateFromHashBody.nameoptional; addid/mime_type/preview_idto upload endpoint; relax upload tags to optionaltotal_tags→tagsin tag add/remove responses; addGET /api/assets/tags/refinehistogram endpointTest plan
python -m pytest tests-unit/assets_test/ -x)--enable-assets, exercise endpoints via curl/api/assets/tags/refineendpoint returns correct histogramidfield enables idempotent creationmime_typeandpreview_idupdates work on PUT endpoint🤖 Generated with Claude Code