Fix #3011: Handle overlapping submodule promotions gracefully in metaflow_extensions#3013
Conversation
- Detect when multiple extension packages promote the same alias via __mf_promote_submodules__ and emit a UserWarning with full details (conflicting packages, alias name, and concrete target modules). - Apply deterministic resolution: the last-loaded package wins, based on the existing topological load order. - Same-package re-promotions are silently allowed (no false positive). - Add _promoted_aliases tracking dict and get_promoted_aliases() accessor. - Add comprehensive unit tests (15 test cases). - Add practical examples in examples/overlapping_submodule_promotions/.
Greptile SummaryThis PR fixes issue #3011 by adding explicit overlap detection to Key findings:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["multiload_all(modules, extension_point, dst_globals)"] --> B["for each MFExtModule m"]
B --> C["alias_submodules(m.module, m.tl_package, extension_point)"]
C --> D{{"__mf_promote_submodules__ present?"}}
D -- No --> K["return lazy_load_custom_modules (empty)"]
D -- Yes --> E["Build new_promotions dict\n(alias → target_module strings)"]
E --> F["for alias, target in new_promotions"]
F --> G{{"alias in _promoted_aliases?"}}
G -- No --> I["_promoted_aliases[alias] = (tl_package, target)"]
G -- Yes --> H{{"prev_tl_package != tl_package?"}}
H -- No (same pkg) --> I
H -- Yes (conflict) --> J["warnings.warn(UserWarning)\n_ext_debug(overlap message)"]
J --> I
I --> F
F --> L["lazy_load_custom_modules.update(new_promotions)"]
L --> M["return lazy_load_custom_modules"]
M --> N["lazy_load_aliases(aliases)\nprepend _LazyFinder to sys.meta_path"]
N --> B
|
| @pytest.fixture(autouse=True) | ||
| def _reset_promoted_aliases(): | ||
| """Ensure each test starts with a clean ``_promoted_aliases`` state.""" | ||
| saved = dict(_promoted_aliases) | ||
| _promoted_aliases.clear() | ||
| yield | ||
| _promoted_aliases.clear() | ||
| _promoted_aliases.update(saved) | ||
|
|
There was a problem hiding this comment.
Test fixture leaks _aliased_modules global state
The fixture saves and restores _promoted_aliases, but alias_submodules also appends to the module-level _aliased_modules list (via _aliased_modules.extend(lazy_load_custom_modules.keys())). This list is never cleared between tests, causing it to grow unboundedly over the test run. While none of the tests in this file check _aliased_modules directly, other tests in the suite that rely on get_aliased_modules() returning a clean list could be affected.
Consider also saving and restoring _aliased_modules:
@pytest.fixture(autouse=True)
def _reset_promoted_aliases():
"""Ensure each test starts with a clean ``_promoted_aliases`` and
``_aliased_modules`` state."""
from metaflow.extension_support import _aliased_modules
saved_aliases = dict(_promoted_aliases)
saved_modules = list(_aliased_modules)
_promoted_aliases.clear()
_aliased_modules.clear()
yield
_promoted_aliases.clear()
_promoted_aliases.update(saved_aliases)
_aliased_modules.clear()
_aliased_modules.extend(saved_modules)|
|
||
| def main(): | ||
| # Clean slate | ||
| _promoted_aliases.clear() | ||
|
|
There was a problem hiding this comment.
Examples import and mutate a private internal variable
All five example scripts import the private _promoted_aliases dict directly and call .clear() on it to reset state between demonstrations. This pattern:
- Teaches users that it is safe/normal to manipulate private module internals directly.
- Mutates global interpreter state, which is fragile if the examples are ever run in the same process or adapted into notebooks.
- Bypasses the public
get_promoted_aliases()accessor that was introduced specifically for safe access.
The same concern applies to the same import pattern in 02_two_extensions_overlap_warning.py, 03_partial_overlap.py, 04_three_way_overlap.py, and 05_same_package_repromotion.py.
For reading the final state, the examples should use the public accessor. For clearing state between demos, consider either structuring each demo as a fresh subprocess call, or at minimum adding a prominent comment explaining that this is internal API only and must not be done in user code.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
| if prev is not None: | ||
| prev_tl_package, prev_target = prev | ||
| if prev_tl_package != tl_package: | ||
| warnings.warn( | ||
| "Overlapping submodule promotion detected for alias " | ||
| "'%s': package '%s' (target '%s') is overriding " | ||
| "package '%s' (target '%s'). The last-loaded package " | ||
| "wins. To silence this warning, ensure only one " | ||
| "extension promotes this alias via " | ||
| "__mf_promote_submodules__." | ||
| % (alias, tl_package, target, prev_tl_package, prev_target), | ||
| UserWarning, | ||
| stacklevel=2, | ||
| ) |
There was a problem hiding this comment.
stacklevel=2 points into Metaflow internals rather than user extension code
With stacklevel=2, the warning's reported origin is the caller of alias_submodules. In normal production use that caller is multiload_all (line 365 of this same file), so users will see a location inside Metaflow's own source rather than anything pointing at their extension config module. This makes it hard to use warnings.filterwarnings("ignore", …, module="…") to silence just one specific extension's conflict.
Using stacklevel=1 (the default) would at least point to the for alias, target loop inside alias_submodules, which is still an internal location but keeps the reported frame inside the function that detected the issue. Alternatively, keeping stacklevel=2 is fine because the message itself is fully self-contained (alias name + both package names), but consider adding guidance on silencing via warnings.filterwarnings to the message itself, for example:
"To silence this warning for this specific alias, use: "
"warnings.filterwarnings('ignore', message=\"Overlapping.*%s.*\")" % alias,Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
|
||
| # Tracks which package promoted each alias so overlapping promotions can be | ||
| # detected and warned about. Mapping: alias -> (tl_package, target_module) | ||
| _promoted_aliases = {} # type: Dict[str, Tuple[str, str]] |
There was a problem hiding this comment.
Tuple is not imported — type comment will fail mypy
The type comment # type: Dict[str, Tuple[str, str]] references Tuple, but line 15 only imports Any and Dict from typing:
from typing import Any, DictTuple is not in scope, so any mypy run over this file will report Name "Tuple" is not defined. A previous thread noted the annotation was imprecise and assumed Tuple was already imported — it isn't.
Fix by adding Tuple to the import:
| _promoted_aliases = {} # type: Dict[str, Tuple[str, str]] | |
| _promoted_aliases = {} # type: Dict[str, Tuple[str, str]] |
And update line 15:
from typing import Any, Dict, TupleAlternatively, use the lowercase built-in (Python 3.9+):
_promoted_aliases = {} # type: dict[str, tuple[str, str]]or simply use Dict[str, tuple] with the existing imports if adding Tuple is not desirable.
| ) | ||
| _promoted_aliases[alias] = (tl_package, target) | ||
|
|
||
| lazy_load_custom_modules.update(new_promotions) |
There was a problem hiding this comment.
lazy_load_custom_modules always updated even when conflicting alias previously registered
lazy_load_custom_modules.update(new_promotions) unconditionally overwrites any prior alias in the local map, which is fine — the intent is last-loaded wins. However, note that the global resolution of metaflow.plugins.datatools ultimately depends on the order entries are prepended to sys.meta_path in lazy_load_aliases. When two packages both promote the same alias:
- Package A's
_LazyFinderis prepended first. - Package B's
_LazyFinderis prepended second (so it appears earlier insys.meta_path).
Python's import machinery queries finders in order, so Package B's finder is hit first and wins — consistent with "last-loaded wins." But if metaflow.plugins.datatools was already resolved and cached in sys.modules by Package A's finder before Package B's finder was registered, Package B's finder would never be consulted and Package A would silently win — the opposite of what is documented and what _promoted_aliases records.
In multiload_all the calls are sequential (lazy_load_aliases(alias_submodules(...))) so a lazy module that has not been accessed yet should be safe. However, any code path that imports metaflow.plugins.datatools between the two multiload_all iterations would produce the undocumented behavior. A defensive guard or a note in the docstring about this assumption would prevent future surprises.
Closes #3011.
What is happening?
The
__mf_promote_submodules__mechanism inmetaflow/extension_support/__init__.pydid not handle the case where multiple extension packages promote the same submodule alias. The existingTODOcomment (around line 258) said "For now, don't do this", but there was no proactive handling — conflicting promotions would silently overwrite each other.Proposed Solution
This PR modifies the
alias_submodulesfunction to:_promoted_aliasestracking dictionary.UserWarningthat clearly identifies:metaflow.plugins.datatools)Additional changes
get_promoted_aliases()public accessor (returns a copy of the internal tracking dict).get_promoted_aliasesto__all__.Tests
15 comprehensive unit tests in
test/unit/test_extension_overlap.py:TestSinglePackagePromotionTestOverlappingPromotionsTestSamePackageRepromotionTestNoPromotionSubmodulesTestWarningContentTestGetPromotedAliasesTestExtraIndentFormattingExamples
5 runnable example scripts in
examples/overlapping_submodule_promotions/:Files Changed
metaflow/extension_support/__init__.py— core fixtest/unit/test_extension_overlap.py— unit testsexamples/overlapping_submodule_promotions/— 5 examples + README