-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
gh-69605: Check for already imported modules in PyREPL module completion #139461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-69605: Check for already imported modules in PyREPL module completion #139461
Conversation
|
Ok, one of the new test fails on Windows x64/32 (but pass on arm) I just setup and built Python on a Win 11 x64 to try to reproduce, the test pass 😵💫 (and the fix works) |
|
I'm going to convert it into a draft until you deem this PR ready. That will also reduce the probability of someone clicking on the PR to review it (like me...) while it's not yet ready for. |
Thanks, I just fixed the failing test, this should be ready for review! Sorry again for the noise, I was dealing with what turned out to be a importer cache issue, which I couldn't reproduce outside of the CI buildbots 😓 [edit] I just noticed I can convert to draft / mark ready myself, will do it next times! |
…ules' of https://github.com/loic-simon/cpython into pyrepl-module-completion-check-for-already-imported-modules
…ready-imported-modules
Lib/_pyrepl/_module_completer.py
Outdated
| if os.path.basename(imported_path) == "__init__.py": # package | ||
| imported_path = os.path.dirname(imported_path) | ||
| import_location = os.path.dirname(imported_path) | ||
| modules = list(pkgutil.iter_modules([import_location])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is correct, it's going to find submodules but we actually want this to be the top-level modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's OK, because os.path.dirname make us search in the folder containing the module origin, not on the origin itself (that's why we need to do it twice for a package):
>>> import os, pkgutil, typing, concurrent
>>>
>>> typing.__spec__.origin # single-file module
'<venv>/lib/python3.14/typing.py'
>>> concurrent.__spec__.origin # package
'<venv>/lib/python3.14/concurrent/__init__.py'
>>>
>>> loc = os.path.dirname(typing.__spec__.origin) # or do it twice for concurrent
>>> [mod for mod in pkgutil.iter_modules([loc]) if mod.name in ("typing", "concurrent")]
[ModuleInfo(module_finder=FileFinder('<venv>/lib/python3.14'), name='concurrent', ispkg=True),
ModuleInfo(module_finder=FileFinder('<venv>/lib/python3.14'), name='typing', ispkg=False)]While refactoring this into a separate function I ended up rewriting the whole thing, it should be more explicit now (and it checks module.__package__ instead of the os.path.basename(imported_path) == "__init__.py" hack)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also realized we will need the same logic for my other PR about attributes completion, so I factored the logic into a private method, I suggest we merge this one first so I can rebase merge main into the other and call it!
EDIT: ok that's not true, since attributes completion checks directly the module object, I forgot 😅 but the refactor is a good thing anyway I think
Lib/_pyrepl/_module_completer.py
Outdated
| if os.path.basename(imported_path) == "__init__.py": # package | ||
| imported_path = os.path.dirname(imported_path) | ||
| import_location = os.path.dirname(imported_path) | ||
| modules = list(pkgutil.iter_modules([import_location])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's OK, because os.path.dirname make us search in the folder containing the module origin, not on the origin itself (that's why we need to do it twice for a package):
>>> import os, pkgutil, typing, concurrent
>>>
>>> typing.__spec__.origin # single-file module
'<venv>/lib/python3.14/typing.py'
>>> concurrent.__spec__.origin # package
'<venv>/lib/python3.14/concurrent/__init__.py'
>>>
>>> loc = os.path.dirname(typing.__spec__.origin) # or do it twice for concurrent
>>> [mod for mod in pkgutil.iter_modules([loc]) if mod.name in ("typing", "concurrent")]
[ModuleInfo(module_finder=FileFinder('<venv>/lib/python3.14'), name='concurrent', ispkg=True),
ModuleInfo(module_finder=FileFinder('<venv>/lib/python3.14'), name='typing', ispkg=False)]While refactoring this into a separate function I ended up rewriting the whole thing, it should be more explicit now (and it checks module.__package__ instead of the os.path.basename(imported_path) == "__init__.py" hack)
Lib/_pyrepl/_module_completer.py
Outdated
| return None | ||
| if not spec.has_location: | ||
| if spec.origin == "frozen": # See Tools/build/freeze_modules.py | ||
| return os.path.join(self._stdlib_path, f"{spec.name}.py") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not always going to be a real path right? e.g. _frozen_importlib.. in any case it does not seem that useful to create the full path when we strip it away with os.path.dirname right after.
Perhaps we can have a smaller diff if instead of reverse-engineering the location, we simply look for the right ModuleInfo in the global cache. After all if the module has already been imported, it should be visible to pkgutil.iter_modules. Something like this:
modules: Iterable[pkgutil.ModuleInfo] = self.global_cache
imported_module = sys.modules.get(path.split('.')[0])
if imported_module is not None:
# Filter modules to those whose name and spec matches the imported module
modules = ...We can compare the module name and origin (maybe with something like mod_info.module_finder.find_spec(mod_info.name)) to make sure we have the right one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can compare the module name and origin (maybe with something like mod_info.module_finder.find_spec(mod_info.name)) to make sure we have the right one.
Clever! That will however provide no completions if the imported module exists, but is not the one returned by pkgutil.iter_modules anymore. But it's a quite degenerate case, and already half-supported because of the global_cache not updating... So I think having no suggestions in this case is fine, and the diff is much smaller indeed! I was probably a bit over zealous 😅
(frozen packages are also left out (__phello__.__spec__.origin == 'frozen' vs mod_info.find_spec('__phello__').origin == '<venv>/Lib/__phello__/__init__.py'), but that's an even more niche issue -- I'm not ever sure there is frozen packages except __phello__!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will however provide no completions if the imported module exists, but is not the one returned by pkgutil.iter_modules anymore
I considered it, but I think that's an edge case not worth trying to work around..
frozen packages are also left out
damn, that's unfortunate, we might need some special handling for frozen modules then. Note that there's also another special origin for built-in modules:
>>> import builtins
>>> builtins.__spec__.origin
'built-in'I'm not ever sure there is frozen packages except phello
There are quite a bit. Here's an example of those that are imported at startup:
>>> frozen = {name for name, m in sys.modules.items() if m.__spec__ and m.__spec__.origin == 'frozen'}
>>> frozen
{'posixpath', 'site', 'importlib._bootstrap_external', 'importlib._bootstrap', 'io', 'zipimport', '_sitebuiltins', '__phello__', '_frozen_importlib', 'collections.abc', '_collections_abc', 'os', 'stat', 'genericpath', 'abc', 'os.path', '_frozen_importlib_external', 'codecs', 'importlib.util', 'importlib.machinery'}
This PR handles an edge-case when suggesting module imports in PyREPL (see this comment):
If a module is already imported, the import machinery will only get submodules from it, even if a module of the same name is higher in the path. This can happen if eg. sys.path has been updated since the module was originaly imported.
The easiest way to reproduce this issue (and how I stumbled on it) is with stdlib modules imported during interpreter startup (before site customisation, so ignoring potential shadowing):
This PR add a special case in
_pyrepl.ModuleCompleter._find_modulesthatIt was a little tricky to get right, I'm not sure about everything 😅 but it pass all tests I could think off!
cc @tomasr8