Investigate New Indexes for AnnData

## Goal

I'd like to bring in new index types to `anndata` to support a few use-cases i.e.,
```python
adata = AnnData(obs=pd.DataFrame(index=some_new_index...)...)
adata[some_subset_of_the_index] # works
````
[without the index being converted to strings](https://github.com/scverse/anndata/blob/e49e1272f122360a41b710e0cfd495b0b77662c1/src/anndata/_core/aligned_df.py#L91-L94) as is usually done (and then allowing subsequent operations smoothly)

## Use cases

- **Geometry-based indexes**: Points or shapes that represent "segmentations" or "observations" in an images can be used as indices.  This is similar to https://geopandas.org/en/stable/ but instead of designating a column as the "geometry" and then doing operations on that, we'd make it a first-class citizen.  So we need to investigate why [Geopandas](https://geopandas.org/en/stable/) make this decision _not_ to use indexes (i.e., is it because[ "An Index instance can only contain hashable object"](https://pandas.pydata.org/docs/reference/api/pandas.Index.html)? are geoarrow objects not hashable? shapely?).
- **"Anonymous" indexes**: A cartesian product of coordinates can be used for pixel-based annotation for proteomics - see https://github.com/complextissue/spatiomic.  Or https://icb-pandas-uuid.readthedocs-hosted.com/en/latest/ definitely can be used for saving space on string indexes that lack semantics!
- **Multi-level annotations**: See https://github.com/scverse/mudata/issues/111 for the proteomics use-case, so maybe this is a `pandas.MultiIndex` and maybe not, not clear. 

## Getting started

I've started a branch that should allow _declaring_ a new `AnnData` object with these indexes and basic operations ideally: https://github.com/scverse/anndata/tree/ig/custom_index_objects

We'll need to work out the specifics of constructing a `pandas.Index` object in each of the above cases.

- **Geometry-based indexes**: We should probably start with [`GeometryArray`](https://github.com/geopandas/geopandas/blob/3231b24807e85aa8e4c973ba6984619626d0d2c2/geopandas/array.py#L346) which is private in `Geopandas` and restrict ourselves only to arrow inputs (or maybe shapely, but just start with one of the two although I don't really get why you would use shapely ATM as the "backing format" for the array)
- **"Anonymous" indexes**: This one is probably simple and can be done via a `pandas.Index` wrapping a custom [`ExtensionArray`](https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray.html#pandas.api.extensions.ExtensionArray) whose `__getitem__` just generates `X-Y` coordinates in row-major order based on the initializing shape i.e., `XYArray(shape=[5, 5])[11] = (2, 1)` if I got that right
- **Multi-level annotations**: I think starting with a `MutliIndex` and then seeing if it meets the needs makes sense before moving to something more complicated

Then once we are confident about the `pandas.Index` object, we can try putting it inside a `AnnData` object

## Limitations

See https://github.com/pandas-dev/pandas/issues/64889 for potential hiccups with unhashable arrow objects

Furthermore, this says nothing of serializability - none of these would be writable to disk, and will all need custom I/o handling.  Luckily I don't think that's needed to create immediately useful things

- **Geometry-based indexes**: These are written with parquet anyway in the spatialdata format, so just need to be read in to the `AnnData` object.  We can then prevent writing on the `AnnData` side via a new setting or something like https://github.com/scverse/anndata/pull/2372
- **"Anonymous" indexes**:These are totally anonymous anyway and you probably wouldn't want them serialized anyway
- **Multi-level annotations**: I am not clear how these are constructed ATM - from databases? So do they need to be serializable or just _reconstructable_ from some simple metadata?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate New Indexes for AnnData #1

Goal

Use cases

Getting started

Limitations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate New Indexes for AnnData #1

Description

Goal

Use cases

Getting started

Limitations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions