Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions metaflow/extension_support/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -530,12 +530,18 @@ def process_file(state: Dict[str, Any], root_dir: str, file: str):
# the extension and __init__.py file)
if len(parts) == 2:
# Ensure that we don't have a __init__.py to force this package to
# be a NS package
# be a NS package. Tolerate empty (0-byte) __init__.py files
# because build systems like Bazel's rules_python auto-generate
# them for every directory, even namespace packages.
if parts[1] == "__init__.py":
raise RuntimeError(
"Package '%s' providing '%s' is not an implicit namespace "
"package as required" % (state["name"], EXT_PKG)
)
init_path = os.path.join(root_dir, file)
if os.path.isfile(init_path) and os.path.getsize(init_path) > 0:
Comment on lines +537 to +538
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect path construction for non-distribution (PYTHONPATH / .pth) packages

os.path.join(root_dir, file) produces the wrong absolute path when process_file is called from the additional-dirs walk (call site 2, line 756) or the PYTHONPATH walk (call site 3, line 871).

In those two call sites, root_dir already ends with EXT_PKG (e.g. /path/to/metaflow_extensions), while file starts with EXT_PKG (e.g. metaflow_extensions/__init__.py). The join therefore becomes /path/to/metaflow_extensions/metaflow_extensions/__init__.py, a path that does not exist. os.path.isfile(init_path) returns False, the RuntimeError is never raised, and the function silently returns — even when the __init__.py is non-empty. This is a regression: the original code would have raised the error unconditionally.

The rest of the function consistently uses os.path.join(root_dir, *parts[1:]) (stripping the leading metaflow_extensions component) when it needs a real on-disk path (full_path_files, line 571). The same pattern is correct here:

init_path = os.path.join(root_dir, *parts[1:])

With root_dir = "/path/to/metaflow_extensions" and parts = ["metaflow_extensions", "__init__.py"]:

  • os.path.join(root_dir, *parts[1:])/path/to/metaflow_extensions/__init__.py
  • os.path.join(root_dir, file)/path/to/metaflow_extensions/metaflow_extensions/__init__.py

Note that for the distribution call site (line 723), root_dir is the parent of metaflow_extensions, so os.path.join(root_dir, *parts[1:]) would give /dist_root/__init__.py which is also wrong for that call site. A safe approach is to probe both candidates and pick whichever exists:

init_path_v1 = os.path.join(root_dir, file)       # correct for dist
init_path_v2 = os.path.join(root_dir, *parts[1:])  # correct for addl_dirs / PYTHONPATH
init_path = init_path_v1 if os.path.isfile(init_path_v1) else init_path_v2

or alternatively, accept the absolute path as a parameter from the call sites that already walk the filesystem.

raise RuntimeError(
"Package '%s' providing '%s' is not an implicit "
"namespace package as required"
% (state["name"], EXT_PKG)
)
return
# Check for any metadata; we can only have one metadata per
# distribution at most
if EXT_META_REGEXP.match(parts[1]) is not None:
Expand Down Expand Up @@ -596,7 +602,11 @@ def process_file(state: Dict[str, Any], root_dir: str, file: str):

if len(parts) == len(ext_list) + 3 and (
EXT_CONFIG_REGEXP.match(parts[-1]) is not None
or parts[-1] == "__init__.py"
or (
parts[-1] == "__init__.py"
and os.path.isfile(os.path.join(root_dir, *parts))
and os.path.getsize(os.path.join(root_dir, *parts)) > 0
Comment on lines +607 to +608
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same incorrect path for non-distribution packages in the config-module check

os.path.join(root_dir, *parts) has exactly the same double-EXT_PKG issue as the change above. When root_dir ends with metaflow_extensions, the join produces /path/to/metaflow_extensions/metaflow_extensions/X/alias/__init__.py instead of /path/to/metaflow_extensions/X/alias/__init__.py.

os.path.isfile(...) returns False, so a non-empty extension-point __init__.py in a PYTHONPATH/pth-installed package is no longer recognized as a configuration module. The package then hits the "does not define a configuration file" RuntimeError at line 933 — a breaking regression for any non-distribution consumer that uses __init__.py as the extension-point init file.

The correct path (consistent with line 571's full_path_files) is:

os.path.join(root_dir, *parts[1:])

Because parts[0] is always EXT_PKG, stripping it yields the path relative to the EXT_PKG root, regardless of how root_dir was constructed.

)
Comment on lines +607 to +609
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate path computation

os.path.join(root_dir, *parts) is evaluated twice — once for isfile and again for getsize. Consider computing it once:

Suggested change
and os.path.isfile(os.path.join(root_dir, *parts))
and os.path.getsize(os.path.join(root_dir, *parts)) > 0
)
or (
parts[-1] == "__init__.py"
and os.path.getsize(
p := os.path.join(root_dir, *parts)
) > 0
if os.path.isfile(
p := os.path.join(root_dir, *parts)
)
else False
)

Or more readably, extract a local variable before the outer if:

_init_abs = os.path.join(root_dir, *parts[1:])
if len(parts) == len(ext_list) + 3 and (
    EXT_CONFIG_REGEXP.match(parts[-1]) is not None
    or (
        parts[-1] == "__init__.py"
        and os.path.isfile(_init_abs)
        and os.path.getsize(_init_abs) > 0
    )
):

):
parts[-1] = parts[-1][:-3] # Remove the .py
config_module = ".".join(parts)
Expand Down
Loading