Skip to content

fix: eliminate redundant metadata_dict fetch in loglines() and _log_s…#3035

Open
sakshisemalti wants to merge 4 commits intoNetflix:masterfrom
sakshisemalti:fix/log-metadata-redundant-fetch
Open

fix: eliminate redundant metadata_dict fetch in loglines() and _log_s…#3035
sakshisemalti wants to merge 4 commits intoNetflix:masterfrom
sakshisemalti:fix/log-metadata-redundant-fetch

Conversation

@sakshisemalti
Copy link

PR Type

  • Bug fix
  • New feature
  • Core Runtime change (higher bar -- see CONTRIBUTING.md)
  • Docs / tooling
  • Refactoring

Summary

loglines() and _log_size() called self.current_attempt internally even when meta_dict was already fetched and passed in by _load_log(), triggering a redundant metadata_dict HTTP request on every log access.

Issue

Fixes #3034

Reproduction

Runtime:

from metaflow import Flow

task = list(list(Flow('TestLogFlow').latest_run)[0])[0]

# Bug: single task.stdout call fetches metadata_dict TWICE
# _load_log() fetches meta_dict (call 1)
# loglines() calls self.current_attempt → metadata_dict again (call 2 - redundant)

print(task.stdout)  # before fix: 2 HTTP requests, after fix: 1 HTTP request

Commands to run:

python test_log_fix.py

Where evidence shows up:

Before:

_load_log() fetches self.metadata_dict            (HTTP call 1)
  → loglines(meta_dict=meta_dict)                 - passed correctly
      → self.current_attempt                      
          → self.metadata_dict                    (HTTP call 2 - redundant)

After:

_load_log() fetches self.metadata_dict            (HTTP call 1)
  → loglines(meta_dict=meta_dict)                 
      → attempt read from meta_dict directly      (no extra HTTP call)

PASS - task.stdout returns correctly

Root Cause

loglines() and _log_size() both call self.current_attempt to get the attempt number. current_attempt is a property that calls self.metadata_dict, which issues an HTTP request to the metadata service.

However, _load_log() already fetches metadata_dict and passes it as meta_dict to both methods. The attempt number is available in meta_dict under the key "attempt", making the self.current_attempt call completely redundant.

Why This Fix Is Correct

The fix reads attempt from meta_dict when it is available:

    attempt = int(meta_dict.get("attempt", 0)) if meta_dict else self.current_attempt

This uses the meta_dict passed into the method for the call. The fallback to self.current_attempt preserves full backward compatibility for any caller that does not pass meta_dict.

The fix is minimal. One line changed in two methods, no new imports, no interface changes.

Failure Modes Considered

  1. Backward compatibility: callers that pass meta_dict=None fall back to self.current_attempt, behavior unchanged.

Tests

  • Unit tests added/updated
  • Reproduction script provided (required for Core Runtime)
  • CI passes
  • If tests are impractical: explain why below and provide manual evidence above

Non-Goals

AI Tool Usage

  • No AI tools were used in this contribution
  • AI tools were used (describe below)
    AI tools were used to generate polished comments + for generating the reproducing issue test
  • Did you review, understand, and test all generated code?
    yes

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR eliminates a redundant metadata_dict HTTP request in loglines() and _log_size() by reading the attempt number directly from the meta_dict that _load_log() / _get_logsize() already fetches and passes in, rather than re-invoking self.current_attempt (which internally calls self.metadata_dict and triggers another round-trip to the metadata service).

Key points:

  • loglines() fix is correct: the new expression self._attempt if self._attempt is not None else int(meta_dict.get("attempt", 0)) faithfully mirrors the logic inside current_attempt, including the explicit-attempt short-circuit for tasks initialized with Task(..., attempt=N).
  • _log_size() fix is inconsistent: attempt = int(meta_dict.get("attempt", 0)) skips the self._attempt check that loglines() and current_attempt both perform. A task created with an explicit attempt whose metadata hadn't been recorded yet would silently fall back to attempt 0 for log-size queries. Adding the same guard (self._attempt if self._attempt is not None else int(meta_dict.get("attempt", 0))) would make both methods consistent.
  • PR description is outdated: The description still shows the earlier form int(meta_dict.get("attempt", 0)) if meta_dict else self.current_attempt, which was superseded by the current implementation and doesn't reflect the actual code.
  • No unit tests added: the test checkboxes in the PR description are unchecked; a regression test covering the attempt-override case and verifying the number of metadata fetches would strengthen confidence in the fix.

Confidence Score: 3/5

  • Mostly safe but _log_size() still skips the self._attempt guard, making it inconsistent with loglines() for explicitly-initialized Task objects.
  • The loglines() half of the fix is well-implemented and correctly mirrors current_attempt semantics. The _log_size() half omits the self._attempt check, which could return an incorrect attempt index for tasks initialized with an explicit attempt that failed before recording metadata. No unit tests were added to cover either the happy path or the edge case.
  • metaflow/client/core.py — specifically the _log_size() method at line 1878 where self._attempt is not consulted before falling back to meta_dict.

Important Files Changed

Filename Overview
metaflow/client/core.py Two one-line changes in loglines() and _log_size() eliminate a redundant metadata_dict HTTP fetch. The loglines() fix correctly mirrors current_attempt semantics by checking self._attempt first. The _log_size() fix omits the self._attempt guard, leaving that method inconsistent with loglines(). No unit tests were added.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Task
    participant MetadataService

    Note over Caller,MetadataService: BEFORE (2 HTTP requests)
    Caller->>Task: task.stdout
    Task->>Task: _load_log("stdout")
    Task->>MetadataService: metadata_dict (HTTP call 1)
    MetadataService-->>Task: meta_dict
    Task->>Task: loglines(stream, meta_dict=meta_dict)
    Task->>Task: self.current_attempt
    Task->>MetadataService: metadata_dict (HTTP call 2 – redundant)
    MetadataService-->>Task: meta_dict (duplicate)
    Task-->>Caller: log content

    Note over Caller,MetadataService: AFTER (1 HTTP request)
    Caller->>Task: task.stdout
    Task->>Task: _load_log("stdout")
    Task->>MetadataService: metadata_dict (HTTP call 1)
    MetadataService-->>Task: meta_dict
    Task->>Task: loglines(stream, meta_dict=meta_dict)
    Task->>Task: attempt = self._attempt ?? meta_dict["attempt"]
    Note right of Task: No extra HTTP call
    Task-->>Caller: log content
Loading

Comments Outside Diff (1)

  1. metaflow/client/core.py, line 1793-1798 (link)

    meta_dict parameter undocumented in public docstring

    The meta_dict parameter was added to the public signature of loglines() but is absent from the method's docstring. Because loglines() is part of the public API (used directly in test/core/tests/large_mflog.py and visible to external users), callers have no guidance on what to pass, what its type should be, or what effect it has.

    Consider adding a parameter entry, for example:

Last reviewed commit: 6532f3b

sakshisemalti and others added 2 commits March 15, 2026 00:10
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redundant metadata_dict fetch in loglines() and _log_size() log loading path

1 participant