Skip to content

feat: accessors#2290

Merged
flying-sheep merged 90 commits intomainfrom
pa/acc
Feb 12, 2026
Merged

feat: accessors#2290
flying-sheep merged 90 commits intomainfrom
pa/acc

Conversation

@flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Jan 8, 2026

User API design

  1. create AdPath instances by descending into a central accessor constant A:
    (A[:, :], lambda ad: ad.X),
    (A[:, "gene-3"], lambda ad: ad[:, "gene-3"].X.flatten()),
    (A["cell-5", :], lambda ad: ad["cell-5"].X.flatten()),
    (A.obs["type"], lambda ad: ad.obs["type"]),
    (A.obs.index, lambda ad: ad.obs.index.values),
    (A.layers["a"][:, :], lambda ad: ad.layers["a"].copy().toarray()),
    (
    A.layers["a"][:, "gene-18"],
    lambda ad: ad[:, "gene-18"].layers["a"].copy().toarray().flatten(),
    ),
    (
    A.layers["a"]["cell-77", :],
    lambda ad: ad["cell-77"].layers["a"].copy().toarray().flatten(),
    ),
    (A.obsm["umap"][0], lambda ad: ad.obsm["umap"][:, 0]),
    (A.obsm["umap"][1], lambda ad: ad.obsm["umap"][:, 1]),
    (A.varp["cons"]["gene-46", :], lambda ad: ad.varp["cons"][46, :].toarray()),
    (A.varp["cons"][:, "gene-46"], lambda ad: ad.varp["cons"][:, 46].toarray()),
  2. Inspect AdPath instance, e.g. to figure out which axes the resulting vector spans:
    pytest.param(A.obsm["c"][:, 0], {"obs"}, id="obsm"),
    pytest.param(A.varp["d"][:, :], ("var", "var"), id="varp"),
    pytest.param(A.varp["d"][:, "c2"], {"var"}, id="varp-col"),
    ],
    )
    def test_axes(ad_path: AdPath, axes: Collection[Literal["obs", "var"]]) -> None:
    assert ad_path.axes == axes
  3. Call AdPath instance to extract a vector (see 1. for examples)

subclassing

… is a direct goal of this, as people should be able to use AdPath subclasses. This means that

  1. the AdPath API is minimal and inspection can be done through .acc (could be even more minimal by just putting it all into a container?):
    acc: VecAcc[Self, I]
    idx: I
    @cached_property
    def axes(self) -> Axes:
  2. It’s trivial to create a new AdAcc constant that produces your own AdPath subclass:
    A = AdAcc(path_class=AdPath)
  3. The data flow is easy to understand:
    1. User uses VecAcc.__getitem__ to get an AdPath or a list of them. In that process,
      1. __getitem__ calls process_idx which verifies and simplifies the index
      2. axes, __repr__, and idx_repr get called too to validate things work
    2. The public API of AdPath can be used, which basically just delegates to VecAcc’s axes, __repr__, idx_repr, and __call__

So in the end everything except for __call__ is validated, i.e. VecAcc.__getitem__ raises exceptions on misuse

class VecAcc[P: AdPath[I], I](abc.ABC): # type: ignore
path_class: type[P]
def process_idx(self, idx: Any, /) -> I:
self.axes(idx)
return idx
def __getitem__(self, idx: Any, /) -> P:
idx = self.process_idx(idx)
return self.path_class(self, idx) # type: ignore
@abc.abstractmethod
def axes(self, idx: I, /) -> Axes: ...
@abc.abstractmethod
def __repr__(self, /) -> str: ...
@abc.abstractmethod
def idx_repr(self, idx: I, /) -> str: ...
@abc.abstractmethod
def __call__(self, adata: AnnData, idx: I, /) -> Vector: ...

TODO:

  • serialization
  • docs (especially data flow)
    • nomenclature: “vector” vs “array”, “accessor”, “vector accessor”, “reference”, “path”, … get it straight!
      Especially since we already call these accessors! Instance namespaces (adata.custom) #1869
  • Array types: fix typing and test with all array types we support
  • improve __contains__ so it doesn’t try to access the array
  • add JSON schema
    • also add it to docs (linked from from_json/to_json)
    • and test it
  • terminology: axis → dim
  • (maybe) runtime autocompletion (__dir__)

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 93.17269% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.24%. Comparing base (5c22d7b) to head (266e6c6).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/acc/__init__.py 96.59% 12 Missing ⚠️
src/anndata/acc/_parse_json.py 81.39% 8 Missing ⚠️
src/anndata/acc/_parse_str.py 84.21% 6 Missing ⚠️
src/anndata/_core/anndata.py 84.37% 5 Missing ⚠️
src/anndata/_core/index.py 87.50% 2 Missing ⚠️
src/anndata/tests/helpers.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2290      +/-   ##
==========================================
- Coverage   86.78%   85.24%   -1.55%     
==========================================
  Files          46       49       +3     
  Lines        7365     7785     +420     
==========================================
+ Hits         6392     6636     +244     
- Misses        973     1149     +176     
Files with missing lines Coverage Δ
src/anndata/__init__.py 81.81% <100.00%> (ø)
src/anndata/_core/raw.py 75.35% <100.00%> (-4.51%) ⬇️
src/anndata/_io/specs/methods.py 91.10% <ø> (-0.36%) ⬇️
src/anndata/types.py 100.00% <100.00%> (ø)
src/anndata/typing.py 100.00% <100.00%> (ø)
src/anndata/tests/helpers.py 83.83% <50.00%> (-9.14%) ⬇️
src/anndata/_core/index.py 94.54% <87.50%> (-0.10%) ⬇️
src/anndata/_core/anndata.py 82.04% <84.37%> (+0.06%) ⬆️
src/anndata/acc/_parse_str.py 84.21% <84.21%> (ø)
src/anndata/acc/_parse_json.py 81.39% <81.39%> (ø)
... and 1 more

... and 5 files with indirect coverage changes

@flying-sheep flying-sheep added this to the 0.13.0 milestone Jan 15, 2026
flying-sheep and others added 2 commits February 12, 2026 15:27
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
@flying-sheep flying-sheep enabled auto-merge (squash) February 12, 2026 15:25
@flying-sheep flying-sheep merged commit 1e87b17 into main Feb 12, 2026
23 checks passed
@flying-sheep flying-sheep deleted the pa/acc branch February 12, 2026 15:41
@patcon
Copy link

patcon commented Feb 13, 2026

Yay! I thiiiiink this is cool and exciting, but it's written at a deeper level of abstraction than I can make easy sense of, and that makes me unsure.. :)

is there a place where I can read what the user level interface will be for the thing this is working toward? I'd love to provide comment on the user level API 🙏🏻

Maybe to cut to the chase: how do you envision this will allow someone to color a sc.pl.embedding() by the third principle component? Or draw 4 plots colored by Leiden, and the first 3 principal components?

thanks! Also pls feel welcome to direct me elsewhere

@flying-sheep
Copy link
Member Author

flying-sheep commented Feb 13, 2026

Don’t worry! While there are quite some possible usage examples strewn throughout this PR, of course the anndata-plot/hv-anndata/scanpy APIs will be documented with more focused examples.

There is currently a prototype for the new scanpy plotting here, but beware as it doesn’t use the final accessor API as defined in this PR.

So to answer your questions:

color a sc.pl.embedding() by the third principle component

probably one of these

import scanpy as sc
from hv_anndata import A  # maybe this will also live on `sc`, idk yet

sc.pl.scatter(adata, A.obsm["umap"], color=A.obsm["pca"][:, 2]).opts(cmap="tab10")
# or more explicitly
sc.pl.scatter(adata, A.obsm["umap"][:, [0, 1]], color=A.obsm["pca"][:, 2]).opts(cmap="tab10")

4 plots colored by Leiden, and the first 3 principal components

probably just replace the color argument above with

color=[A.obs["leiden"], *A.obsm["pca"][:, [0, 1, 2]]]

@ilan-gold
Copy link
Contributor

@patcon Check out our latest docs: https://anndata.readthedocs.io/en/latest/accessors.html which point at the main branch

@patcon
Copy link

patcon commented Feb 15, 2026

Nice! Thanks! I'll probably add my own syntactic sugar, but this will make it really simple :)

sc.pl.embedding(adata, basis="umap", color="X_pca[2]")

sc.pl.emvedding(adata, basis="umap", color=["leiden", "X_pca[0, 1, 2]"])

I have found it quite easy to wrap scanpy and have the scripts be quite legible to new coders at a weekly event I host, especially with some visuals of the data structure. The above was what I felt seemed most legible -- but I can always do a thin wrapper with the excellent code you've created 🙏🏻

@flying-sheep
Copy link
Member Author

flying-sheep commented Feb 15, 2026

Don’t make your own, there is already a shortcut syntax: https://anndata.readthedocs.io/en/latest/generated/anndata.acc.AdAcc.html#anndata.acc.AdAcc.resolve

If you want to extend it or think it could be better, please present improvements to the design in an issue before we release 0.13, then we can still change it.

@ilan-gold ilan-gold mentioned this pull request Feb 23, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reference paths / accessors

3 participants