Skip to content

Rewrite package data reading#1150

Merged
kabeor merged 7 commits intoqilingframework:devfrom
nullableVoidPtr:pkgutil
May 10, 2022
Merged

Rewrite package data reading#1150
kabeor merged 7 commits intoqilingframework:devfrom
nullableVoidPtr:pkgutil

Conversation

@nullableVoidPtr
Copy link
Contributor

Checklist

Which kind of PR do you create?

  • This PR only contains minor fixes.

Coding convention?

  • The new code conforms to Qiling Framework naming convention.
  • The imports are arranged properly.

Extra tests?

  • No extra tests are needed for this PR.

Changelog?

  • This PR doesn't need to update Changelog.

Target branch?

  • The target branch is dev branch.

One last thing


Using __file__ to load internal resources is not recommended, as it may not work with import hooks or Python Eggs. Here, I am using the standard pkgutil (a wrapper around another standard library, importlib.resources) to retrieve load the data.

Note: pkgutil.get_data unfortunately returns bytes not strings or a file object, so I have to decode it first before passing it on. The alternative here is to use importlib.resources.files which returns a Path object (so, ilr.files("qiling") / "profiles" / f"{os}.ql") which you can then call open() to return a TextIO, but it is only supported in 3.9.

@nullableVoidPtr nullableVoidPtr changed the base branch from master to dev May 7, 2022 12:30
@elicn
Copy link
Member

elicn commented May 7, 2022

I agree that __file__ may not be the right way to go, but neither is pkgutil. We could use inspect.getfile (see here), which is the preferred way to replace __file__ and requires less code modifications.

@nullableVoidPtr
Copy link
Contributor Author

nullableVoidPtr commented May 8, 2022

but neither is pkgutil

Would it be because of how the code would need to be restructured? Yeah, it's not ideal, but if either 3.9 is supported (unlikely I think) or if the backported importlib_resources packages is also added as a dependency (also unlikely), this is the way to do it (besides using the complicated ResourceReader)

inspect.getfile

As I understand it, it's basically a drop-in replacement for __file__, which can then be passed onto os functions, which may not be aware of non-standard loaders. importlib (and subsequently pkgutil) are, and calls the loader to handle file loading for you.

Further reading here:
Setuptools
Docs on importlib (backport, but compatible with standard Python Lib and explains rationale)

@elicn
Copy link
Member

elicn commented May 8, 2022

inspect.getfile is a good replacement for os.path.abspath(__file__), since it just replaces the usage of __file__ without requiring too much modifications. It follows the object passed to it (class, method, etc.) and returns the filename where it was defined. I find it very comfortable when you need to retreive the filename in which a specific class was defined - or a local class, if you want the __file__ equivalent.

I tend to prefer pkgutil.get_data less since it mixes the filename resolution and content retrieval in one statement and relies on exceptions to indicate that the resouce was not found. Also, as far as I understand it requires the code to be appear as a package (not too bad for itself, but will perhaps require annoying empty __init__.py files only for this purpose).

I didn't follow your comment on the non-standard loaders; do we really need this, or this is just a 'nice to have' feature we gain there..? To me it looks like the latter (.. unless I am missing it?) and not a strong reasoning for itself.

@nullableVoidPtr
Copy link
Contributor Author

I didn't follow your comment on the non-standard loaders; do we really need this, or this is just a 'nice to have' feature we gain there..? To me it looks like the latter (.. unless I am missing it?) and not a strong reasoning for itself.

More of a nice-to-have, don't worry too much about it. There is a weird-but-standard(?) case in where file and inspect.getfile does work with Python's ZipImporter, but not when the path is then passed onto open (open('pkg.zip/__init__.py')).

I tend to prefer pkgutil.get_data less since it mixes the filename resolution and content retrieval in one statement and relies on exceptions to indicate that the resouce was not found.

Good call, and good catch! get_data actually returns None, but it seems that my leftover code from using importlib.resources.files snuck in. It was far more convenient (returning a Path-like object), but it failed on CI as 3.8 didn't support it.

Also, as far as I understand it requires the code to be appear as a package (not too bad for itself, but will perhaps require annoying empty init.py files only for this purpose).

pkgutil.get_data can actually "dive deeper" into packages, that is, it can go into a non-package subdirectory that is physically located within another package (see the diffs to the gdb code where it relies on the arch and further the xfercmd file). This is something that

Nevertheless, I can update this PR to use inspect.getfiles shortly.

As per elicn's comments.
TODO: If Qiling allows 3.9 standard Libs, move everything to
importlib.resources.files
@nullableVoidPtr
Copy link
Contributor Author

@elicn: made changes you wanted but wanted to bring this up:
It seems that inspect.getfile is a drop-in replacement for __file__ because it is semantically the same.
If you pass a frame object to inspect.getfile, it is the same as current_frame_function.co_filename.
Alternatively, if you pass a class object to inspect.getfile, it is the same as module_that_defined_class.__file__.
(Again, properly accounting for import hooks is just a nice-to-have.)

I've also elected to use the Path class to traverse into the package dir. Let me know what you think.

@elicn
Copy link
Member

elicn commented May 8, 2022

Thanks for that; I always appreciate a good technical discussion (:

A few petty notes though:

  • Since currentframe may be somewhat problematic, I prefer to use a local class or method as an anchor
  • Where only a path is needed without any filesystem shenanigans, I prefer PurePath rather than Path [even if the file needs to be opened]
  • The changes in gdb are about to become irrelevant as I recently re-wrote the entire module

@kabeor
Copy link
Member

kabeor commented May 10, 2022

Cool, thanks!

@kabeor kabeor merged commit 561f3a8 into qilingframework:dev May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants