Skip to content

[For discussion only]tolerate empty __init__.py in namespace package discovery#3007

Open
liuyinglao wants to merge 1 commit intomasterfrom
fix/tolerate-empty-init-namespace-pkgs
Open

[For discussion only]tolerate empty __init__.py in namespace package discovery#3007
liuyinglao wants to merge 1 commit intomasterfrom
fix/tolerate-empty-init-namespace-pkgs

Conversation

@liuyinglao
Copy link

@liuyinglao liuyinglao commented Mar 11, 2026

Build systems like Bazel's rules_python auto-generate empty (0-byte) init.py files for every directory, including those that are meant to be implicit namespace packages. This causes metaflow's extension discovery to reject the metaflow_extensions namespace package with:

RuntimeError: Package '...' providing 'metaflow_extensions' is not
an implicit namespace package as required

An empty init.py is functionally equivalent to no init.py for namespace package purposes, so we can safely skip it.

Two changes:

  1. At the metaflow_extensions/ top level: skip empty init.py instead of raising RuntimeError, since the file carries no package initialization code.

  2. At extension point directories (e.g. alias/, config/): do not treat an empty init.py as a configuration file. Only init.py files with actual content should be considered config modules.

Fixes compatibility with Bazel's rules_python which sets enable_implicit_namespace_pkgs=True but still generates empty init.py files in extracted pip wheels.

Made-with: Cursor

Here's the filled-in PR template:


PR Type

  • Bug fix

Summary

Tolerate empty (0-byte) __init__.py files in metaflow_extensions namespace package discovery. Build systems like Bazel's rules_python auto-generate these for every directory, breaking metaflow's extension loading with RuntimeError: Package '...' providing 'metaflow_extensions' is not an implicit namespace package as required.

Issue

No existing issue. This affects any metaflow user running under Bazel with rules_python pip integration.

Fixes compatibility with Bazel's rules_python (tested with v1.8.4) which generates empty __init__.py files in extracted pip wheels even when enable_implicit_namespace_pkgs=True.

Reproduction

Runtime: local (Bazel rules_python 1.8.4, Python 3.10)

Commands to run:

# Given a flow that depends on @genai_pip//nflx_metaflow and @genai_pip//metaflow
# via Bazel's pip.parse with enable_implicit_namespace_pkgs=True:
./bazel run //path/to:flow -- package info

Where evidence shows up: parent console

Before (error / log snippet)
File ".../metaflow/extension_support/__init__.py", line 535, in process_file
    raise RuntimeError(
RuntimeError: Package '_pythonpath_0' providing 'metaflow_extensions' is not an implicit namespace package as required

Additionally, even after patching only the top-level check, a second error surfaces at extension point directories (e.g. alias/, config/) where empty __init__.py is incorrectly treated as a configuration file:

RuntimeError: Package '_pythonpath_0[nflx]' defines more than one configuration file for 'alias':
  'metaflow_extensions.nflx.alias.__init__' and 'metaflow_extensions.nflx.alias.mfextinit_nflxfastdata'
After (evidence that fix works)
Metaflow 2.19.18+nflxfastdata(2.22.1) executing TestContentQAInferenceResults for user:yinglaol
Package size: 12.09 MB
Number of files: 621

Metaflow version: 2.19.18+nflxfastdata(2.22.1)

Metaflow extensions packaged:
  - nflx-fastdata (<unk>) @ 2.22.1
  - nflx-metaflow (<unk>) @ 2.22.1
  - _pythonpath_0 (nflxfastdata) @ _local_

No workaround code needed in the flow file -- plain from metaflow import ... works.

Root Cause

Bazel's rules_python extracts each pip wheel into its own isolated site-packages directory. For namespace packages like metaflow_extensions, it generates empty (0-byte) __init__.py files even when enable_implicit_namespace_pkgs=True is set (this is a known rules_python issue).

In _get_extension_packages()process_file():

  1. Line 534: Any metaflow_extensions/__init__.py triggers an unconditional RuntimeError, even if the file is empty and carries no package initialization code.

  2. Line 597-599: __init__.py in extension point directories (e.g. alias/) is unconditionally treated as a configuration file. When Bazel puts nflx-metaflow and nflx-fastdata in separate sys.path entries, each has its own alias/__init__.py (auto-generated, empty). This causes a "more than one configuration file" conflict since the empty __init__.py is counted alongside the real mfextinit_*.py config.

In a standard pip install, both packages merge into a single site-packages/metaflow_extensions/ directory with no __init__.py at the namespace levels, so neither issue surfaces.

Why This Fix Is Correct

An empty __init__.py is functionally equivalent to no __init__.py for namespace package purposes -- it contains no initialization code and Python treats the directory identically. The fix preserves the existing invariant (reject non-empty __init__.py that would make metaflow_extensions a regular package) while tolerating the harmless empty files that build tools generate.

The fix is minimal: two os.path.getsize() == 0 checks, no behavioral change for any non-empty __init__.py.

Failure Modes Considered

  1. Non-empty __init__.py still rejected: If someone intentionally creates a metaflow_extensions/__init__.py with content (which would break namespace package merging), the RuntimeError still fires because getsize() > 0.

  2. Missing file on disk: If __init__.py appears in a directory listing but doesn't exist on disk (e.g. broken symlink), os.path.isfile() returns False and we skip it, which is the safe default -- a missing file can't break namespace package semantics.

  3. Backward compatibility with pip installs: In a standard pip/uv install, metaflow_extensions/__init__.py doesn't exist at all, so the new code path is never reached. Behavior is identical to before.

  4. Extension config file detection: Empty __init__.py in extension point dirs (e.g. alias/) is no longer treated as a config module. Only __init__.py with actual content and files matching EXT_CONFIG_REGEXP are considered. This matches the intended behavior since empty __init__.py was never meant to be a config file.

Tests

apply the fix locally and it works

genai_venv) algo-dev yinglaol ~/algo $ ./bazel run genai/medc_experiment_workflows/metaflow_assets/medc/content_qa/inference_results:flow -- package info

INFO: Invocation ID: 5a10a6fe-a9ba-426f-998d-567d784fb288
INFO: Streaming build results to: https://netflix.buildbuddy.io/invocation/5a10a6fe-a9ba-426f-998d-567d784fb288
INFO: Analyzed target //genai/medc_experiment_workflows/metaflow_assets/medc/content_qa/inference_results:flow (876 packages loaded, 61406 targets configured).
INFO: Found 1 target...
Target //genai/medc_experiment_workflows/metaflow_assets/medc/content_qa/inference_results:flow up-to-date:
  bazel-bin/genai/medc_experiment_workflows/metaflow_assets/medc/content_qa/inference_results/flow
INFO: Elapsed time: 4.255s, Critical Path: 0.23s
INFO: 1 process: 27 action cache hit, 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/genai/medc_experiment_workflows/metaflow_assets/medc/content_qa/inference_results/flow <args omitted>
INFO: Streaming build results to: https://netflix.buildbuddy.io/invocation/5a10a6fe-a9ba-426f-998d-567d784fb288
Metaflow 2.19.18+nflxfastdata(2.22.1) executing TestContentQAInferenceResults for user:yinglaol
Package size: 12.09 MB
Number of files: 621

Metaflow version: 2.19.18+nflxfastdata(2.22.1)

Metaflow extensions packaged:
  - nflx-fastdata (<unk>) @ 2.22.1
  - nflx-metaflow (<unk>) @ 2.22.1
  - _pythonpath_0 (nflxfastdata) @ _local_

User code in flow flow TestContentQAInferenceResults:
  - Packaged from directory /home/coder/.local/share/tmp/rrv2-bazel/bzl/bazel__home_coder_algo/execroot/_main/bazel-out/k8-fastbuild/bin/genai/medc_experiment_workflows/metaflow_assets/medc/content_qa/inference_results/flow.runfiles/_main/genai/medc_experiment_workflows/metaflow_assets/medc/content_qa/inference_results/
  - Filtered by suffixes: .py, .RDS, .R
  - Excluded directories: .mf_code, .mf_meta, _escape_trampolines
(genai_venv) algo-dev yinglaol ~/algo $ 

Non-Goals

AI Tool Usage

  • No AI tools were used in this contribution
  • AI tools were used (describe below)

Build systems like Bazel's rules_python auto-generate empty (0-byte)
__init__.py files for every directory, including those that are meant
to be implicit namespace packages.  This causes metaflow's extension
discovery to reject the metaflow_extensions namespace package with:

    RuntimeError: Package '...' providing 'metaflow_extensions' is not
    an implicit namespace package as required

An empty __init__.py is functionally equivalent to no __init__.py for
namespace package purposes, so we can safely skip it.

Two changes:

1. At the metaflow_extensions/ top level: skip empty __init__.py
   instead of raising RuntimeError, since the file carries no package
   initialization code.

2. At extension point directories (e.g. alias/, config/): do not treat
   an empty __init__.py as a configuration file.  Only __init__.py
   files with actual content should be considered config modules.

Fixes compatibility with Bazel's rules_python which sets
enable_implicit_namespace_pkgs=True but still generates empty
__init__.py files in extracted pip wheels.

Made-with: Cursor
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes Metaflow's extension discovery to tolerate empty (0-byte) __init__.py files at the metaflow_extensions namespace-package level and at extension-point directories, restoring compatibility with Bazel's rules_python which auto-generates empty __init__.py files for every directory even when enable_implicit_namespace_pkgs=True is set.

Key changes:

  • metaflow/extension_support/__init__.py (Change 1, ~line 536): Instead of unconditionally raising RuntimeError when metaflow_extensions/__init__.py exists, the code now skips empty files and early-returns.
  • metaflow/extension_support/__init__.py (Change 2, ~line 603): An empty __init__.py in an extension-point directory is no longer treated as a config module.

Issues found:

  • Both changes use os.path.join(root_dir, file) / os.path.join(root_dir, *parts) to construct the on-disk path for os.path.isfile / os.path.getsize. This is correct only for wheel-distribution packages (call site at line 723) where root_dir is the parent of metaflow_extensions. For the .pth/editable-install additional-dirs path (line 756) and the PYTHONPATH walk (line 871), root_dir already ends with metaflow_extensions, so the join duplicates that component and produces a non-existent path.
  • Consequence for Change 1: a non-empty __init__.py in a PYTHONPATH/pth package silently passes instead of raising the appropriate RuntimeError.
  • Consequence for Change 2: a non-empty __init__.py used as an extension-point config in a PYTHONPATH/pth package is silently ignored, causing a later RuntimeError about a missing configuration file — a breaking regression for any non-wheel consumer.
  • No unit tests are included with this change.

Confidence Score: 2/5

  • Fix is correct for wheel-distribution packages (the Bazel target use-case) but introduces silent regressions for PYTHONPATH / pth-based package installs.
  • The path construction os.path.join(root_dir, file) is only correct for the distribution call site; for the two non-distribution call sites the result is a duplicate metaflow_extensions segment, causing os.path.isfile to return False. This silently suppresses the RuntimeError in Change 1 and silently drops a valid config module in Change 2 for pth/PYTHONPATH packages. No tests were added to validate the fix or guard against these regressions.
  • metaflow/extension_support/init.py — specifically the process_file inner function and both os.path.join path-construction sites introduced by this PR.

Important Files Changed

Filename Overview
metaflow/extension_support/init.py Two targeted changes to tolerate empty __init__.py files: the top-level NS-package guard and the extension-point config-module detection. Both changes introduce filesystem checks using os.path.join(root_dir, file) / os.path.join(root_dir, *parts), which produce the correct path only for wheel-distribution packages; for PYTHONPATH/pth-based packages the path includes a duplicate metaflow_extensions component, causing os.path.isfile to return False and silently suppressing the RuntimeError (change 1) or silently skipping a valid __init__.py config module (change 2).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[process_file called] --> B{parts is EXT_PKG\nand len greater than 1?}
    B -- No --> Z[Ignore file]
    B -- Yes --> C{Top-level file?\nlen == 2 and parts-1 == init}
    C -- Yes --> D[Build init_path via\nos.path.join root_dir + file]
    D --> E{isfile and size > 0?}
    E -- Yes, non-empty --> F[raise RuntimeError\nnot a NS package]
    E -- No, empty or bad path --> G[return early]
    C -- No --> H{Meta regexp match?}
    H -- Yes --> I[Record meta_module]
    H -- No --> J[Add to state files]
    J --> K{Extension point depth?\nconfig file check}
    K -- EXT_CONFIG_REGEXP --> L[Set config_module]
    K -- init.py + isfile + size > 0 --> L
    K -- init.py + isfile returns False\nbad path for pth or PYTHONPATH --> M[config_module stays None]
    L --> N[Register extension package]
    M --> O[RuntimeError later:\nno config file defined]
Loading

Last reviewed commit: c8db26e

Comment on lines +537 to +538
init_path = os.path.join(root_dir, file)
if os.path.isfile(init_path) and os.path.getsize(init_path) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect path construction for non-distribution (PYTHONPATH / .pth) packages

os.path.join(root_dir, file) produces the wrong absolute path when process_file is called from the additional-dirs walk (call site 2, line 756) or the PYTHONPATH walk (call site 3, line 871).

In those two call sites, root_dir already ends with EXT_PKG (e.g. /path/to/metaflow_extensions), while file starts with EXT_PKG (e.g. metaflow_extensions/__init__.py). The join therefore becomes /path/to/metaflow_extensions/metaflow_extensions/__init__.py, a path that does not exist. os.path.isfile(init_path) returns False, the RuntimeError is never raised, and the function silently returns — even when the __init__.py is non-empty. This is a regression: the original code would have raised the error unconditionally.

The rest of the function consistently uses os.path.join(root_dir, *parts[1:]) (stripping the leading metaflow_extensions component) when it needs a real on-disk path (full_path_files, line 571). The same pattern is correct here:

init_path = os.path.join(root_dir, *parts[1:])

With root_dir = "/path/to/metaflow_extensions" and parts = ["metaflow_extensions", "__init__.py"]:

  • os.path.join(root_dir, *parts[1:])/path/to/metaflow_extensions/__init__.py
  • os.path.join(root_dir, file)/path/to/metaflow_extensions/metaflow_extensions/__init__.py

Note that for the distribution call site (line 723), root_dir is the parent of metaflow_extensions, so os.path.join(root_dir, *parts[1:]) would give /dist_root/__init__.py which is also wrong for that call site. A safe approach is to probe both candidates and pick whichever exists:

init_path_v1 = os.path.join(root_dir, file)       # correct for dist
init_path_v2 = os.path.join(root_dir, *parts[1:])  # correct for addl_dirs / PYTHONPATH
init_path = init_path_v1 if os.path.isfile(init_path_v1) else init_path_v2

or alternatively, accept the absolute path as a parameter from the call sites that already walk the filesystem.

Comment on lines +607 to +608
and os.path.isfile(os.path.join(root_dir, *parts))
and os.path.getsize(os.path.join(root_dir, *parts)) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same incorrect path for non-distribution packages in the config-module check

os.path.join(root_dir, *parts) has exactly the same double-EXT_PKG issue as the change above. When root_dir ends with metaflow_extensions, the join produces /path/to/metaflow_extensions/metaflow_extensions/X/alias/__init__.py instead of /path/to/metaflow_extensions/X/alias/__init__.py.

os.path.isfile(...) returns False, so a non-empty extension-point __init__.py in a PYTHONPATH/pth-installed package is no longer recognized as a configuration module. The package then hits the "does not define a configuration file" RuntimeError at line 933 — a breaking regression for any non-distribution consumer that uses __init__.py as the extension-point init file.

The correct path (consistent with line 571's full_path_files) is:

os.path.join(root_dir, *parts[1:])

Because parts[0] is always EXT_PKG, stripping it yields the path relative to the EXT_PKG root, regardless of how root_dir was constructed.

Comment on lines +607 to +609
and os.path.isfile(os.path.join(root_dir, *parts))
and os.path.getsize(os.path.join(root_dir, *parts)) > 0
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate path computation

os.path.join(root_dir, *parts) is evaluated twice — once for isfile and again for getsize. Consider computing it once:

Suggested change
and os.path.isfile(os.path.join(root_dir, *parts))
and os.path.getsize(os.path.join(root_dir, *parts)) > 0
)
or (
parts[-1] == "__init__.py"
and os.path.getsize(
p := os.path.join(root_dir, *parts)
) > 0
if os.path.isfile(
p := os.path.join(root_dir, *parts)
)
else False
)

Or more readably, extract a local variable before the outer if:

_init_abs = os.path.join(root_dir, *parts[1:])
if len(parts) == len(ext_list) + 3 and (
    EXT_CONFIG_REGEXP.match(parts[-1]) is not None
    or (
        parts[-1] == "__init__.py"
        and os.path.isfile(_init_abs)
        and os.path.getsize(_init_abs) > 0
    )
):

@liuyinglao liuyinglao changed the title fix: tolerate empty __init__.py in namespace package discovery [For discussion only]tolerate empty __init__.py in namespace package discovery Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant