Conversation
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="pkg/filesystem/vnode/tasks.go">
<violation number="1" location="pkg/filesystem/vnode/tasks.go:20">
P2: This still advertises a fixed sentinel size when the real task log size is unknown. Clients that rely on stat size can still stop reads early; the feedback calls out using the actual S2 content size instead of a sentinel. Consider fetching/logging the true object size (e.g., S2 metadata) for Getattr/Readdir rather than returning the sentinel.
(Based on your team's feedback about advertising actual S2-backed file sizes rather than using a sentinel size.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
luke-lombardi
left a comment
There was a problem hiding this comment.
@cooper-grc I think this branch may need to get cleaned up / integrated into the new agent task model
luke-lombardi
left a comment
There was a problem hiding this comment.
Overall: Clean, minimal alternative to the #49/#50 approach. The lazy size-caching strategy — sentinel until a task is terminal, then cache the real size — is well-reasoned: running tasks' sizes change constantly so a 5s content cache (PR #50 approach) adds complexity without much benefit, while terminal tasks are stable forever. This feels like the right tradeoff for a size-only cache.
Relationship to #49 and #50:
This PR targets main directly and takes a different approach than the #49→#50 chain. Clarification on which branch will land is needed — are these competing approaches or is #88 meant to be the replacement? The naming (cg/tasks-files-3) suggests iteration rather than competition.
Running tasks still show 10MB sentinel in listings:
This is the deliberate tradeoff. If a user does ls -la /tasks and sees 10MB for all in-progress tasks, that's misleading but technically acceptable since they need to cat the file to see actual content anyway. A comment on taskFileSentinelSize documenting this behavior for reviewers/maintainers would help.
Race in sizeCacheMu vs cacheMu:
These are two separate locks — one for the task list cache and one for the size cache. This is correct, but make sure no code holds both locks simultaneously (a quick scan shows they don't — each critical section only acquires one, which is fine).
task.IsTerminal() is a domain boundary:
The IsTerminal() call in Read is the right semantic gate for caching permanence. Worth confirming that IsTerminal() covers all final states (complete, failed, cancelled) and doesn't omit any edge cases like timeout or preempted states.
No tests:
This PR doesn't include tests. Adding a test analogous to TestTasksVNode_GetattrUsesPlaceholderUntilContentCached from PR #50 — verifying that a terminal task's size is cached after Read and returned by subsequent Getattr without re-fetching — would strengthen confidence.
luke-lombardi
left a comment
There was a problem hiding this comment.
PR Review: Get Task File Sizes From S2
Summary: Solid improvement over PR #49. Rather than permanently lying about file size, this caches the real rendered size after the first read and uses the sentinel only as a temporary placeholder. Correct behavior for FUSE/NFS clients.
Positives:
taskFileSizeis clean: double-checked read lock → sentinel fallback pattern is idiomatic Go.- Caching only on terminal tasks (
IsTerminal()) is the right scope — non-terminal tasks can still change. - Both
GetattrandReaddiruse the same helper consistently. sizeCacheinitialized in both constructors (NewTasksVNodeandNewTasksVNodeGRPC) — no nil map panics.
Observations / Suggestions:
-
Cache grows unbounded —
sizeCacheis never pruned. For long-running processes with many distinct task IDs this could leak memory. A simple TTL-based eviction or bounding by max entries would help. -
Lock ordering —
ReadacquiressizeCacheMu.Lockinside an existing write lock ondata(depending on howReadis structured upstream). Worth a quick audit to ensure no lock inversion withcacheMu. -
Write under write lock is fine, but consider
sync.Map— since entries are written once and read many times,sync.Mapcould reduce contention in high-read scenarios. Not critical. -
No eviction when task is deleted — if tasks can be removed from the backend, stale size entries will persist in the cache. Low risk since the cache is per-process-lifetime, but worth noting.
-
Relationship to PR #49 — this PR duplicates the sentinel constant from #49. If both are open, one should be closed or they should be merged in the right order to avoid a conflict.
Verdict: Approved with minor caveats. The unbounded cache is the main concern for long-running deployments — a simple max-size cap would address it.
luke-lombardi
left a comment
There was a problem hiding this comment.
PR #88: Get Task File Sizes From S2
Improves on the pure-sentinel approach from #49 by caching the actual rendered size after the first Read for terminal tasks, then serving that cached value from Getattr and Readdir going forward.
What looks good:
- The two-phase approach (sentinel → real size after read) is pragmatic: it avoids a blocking fetch on every
Getattr/Readdirwhile still converging to the correct size once the file has been read once. - Restricting size caching to terminal tasks (
task.IsTerminal()) is the right call — in-progress task content changes, so caching its size would produce stale listings. - Separate
sizeCacheMufrom the task-listcacheMuis clean; avoids lock contention between list operations and size lookups. - Constructor initialization of
sizeCachein bothNewTasksVNodeandNewTasksVNodeGRPCis consistent.
Observations / Nits:
- Non-terminal tasks will always report the 10MB sentinel on every
Getattr/Readdircall until they complete. If a tool does astatloop watching an in-progress task, it will always see 10MB. This is probably acceptable, but worth documenting in the function-level comment ontaskFileSize. taskFileSizeuses early-return on cache hit and falls through to return the sentinel — readable, but a single-expression ternary or named return might make the intent clearer. Minor style point.- No tests added in this PR. PR #50 adds a test file for this layer; if #50 is the intended follow-up, that's fine — but if this PR targets main independently, consider adding at least one test covering the terminal-task caching path.
No blocking issues.
luke-lombardi
left a comment
There was a problem hiding this comment.
Overall: Good incremental improvement over PR #49. Caching the real rendered size after first read removes the permanent sentinel inaccuracy for terminal tasks, which matters for clients that rely on reported size (e.g., for progress bars or pre-allocation).
Issues / observations:
-
Unbounded
sizeCachegrowth —sizeCacheis a plainmap[string]int64with no eviction. For a long-running gateway that processes many tasks over time, every terminal task ID will accumulate in memory indefinitely. Consider either capping the cache size (LRU with a fixed bound) or piggybacking on the existingtasksCacheTTL/cacheExpirymechanism to clear stale entries. -
sizeCachenot initialized inNewTasksVNode's GRPCless path — Wait, looking again:NewTasksVNodedoes initializesizeCache: make(map[string]int64)— this is fine. ButNewTasksVNodeGRPCalso does it correctly. No issue here. -
Supersedes PR #49 — Both PRs define
taskFileSentinelSizeand modify the same lines intasks.go. They will conflict at merge time. PR #49 should be closed once this one is merged. -
Non-terminal tasks always report sentinel size — This is correct and expected. However, if a non-terminal task's content changes between reads (e.g., live streaming output), the sentinel will remain 10MB even though the actual content may be smaller or larger. This is an inherent limitation of the sentinel approach and is acceptable given the current design.
-
RLock → check → RUnlock pattern — The double-lock pattern in
taskFileSize(RLock, check, RUnlock) is correct. No TOCTOU issue here since the write path uses a separateLock(). Good.
Looks solid overall. Resolve the potential cache growth concern and close PR #49.






Summary by cubic
Return accurate sizes for task files in /tasks using a 10MB sentinel until the real size is known, then cache the actual size after the first read for terminal tasks. This fixes size reporting in listings and helps FUSE/NFS issue full reads.
Written for commit 3c6d422. Summary will update on new commits.