Skip to content

netcdf4 backend claims **all** remote files - preventing reading zarr #10801

@ianhi

Description

@ianhi

What happened?

If you point xr.open_datatree at a remote url with netcdf4 installed then netcdf4

What did you expect to happen?

open wiht zarr (in this case i expect the zarr backend to fail) rather than the netcdf backend

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
#   "zarr>=2.18.0",
#   "numpy>=1.24.0",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!

"""Download and test loading OME-Zarr example data."""

import xarray as xr

xr.show_versions()

url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr"
remote_dt = xr.open_datatree(url)
print(remote_dt)

Steps to reproduce

uv run above script

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <?xml^ version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>idr</BucketName><RequestId>tx0000000000000131f743f-0068dbfc97-7518e06c-default</RequestId><HostId>7518e06c-default-default</HostId></Error>
Traceback (most recent call last):
  File "/Users/ian/Documents/dev/xarray/xarray/backends/file_manager.py", line 219, in _acquire_with_cache_info
    file = self._cache[self._key]
           ~~~~~~~~~~~^^^^^^^^^^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/lru_cache.py", line 56, in __getitem__
    value = self._cache[key]
            ~~~~~~~~~~~^^^^^
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '6a4d8a9f-b9b0-44b1-8fed-a0d8f5bd69bb']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ian/Documents/dev/xarray/repro.py", line 22, in <module>
    remote_dt = xr.open_datatree(url)
  File "/Users/ian/Documents/dev/xarray/xarray/backends/api.py", line 1066, in open_datatree
    backend_tree = backend.open_datatree(
        filename_or_obj,
    ...<2 lines>...
        **kwargs,
    )
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 792, in open_datatree
    groups_dict = self.open_groups_as_dict(
        filename_or_obj,
    ...<15 lines>...
        **kwargs,
    )
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 839, in open_groups_as_dict
    store = NetCDF4DataStore.open(
        filename_or_obj,
    ...<7 lines>...
        autoclose=autoclose,
    )
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 524, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 428, in __init__
    self.format = self.ds.data_model
                  ^^^^^^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 533, in ds
    return self._acquire()
           ~~~~~~~~~~~~~^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 527, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/Users/ian/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/contextlib.py", line 141, in __enter__
    return next(self.gen)
  File "/Users/ian/Documents/dev/xarray/xarray/backends/file_manager.py", line 207, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/file_manager.py", line 225, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "src/netCDF4/_netCDF4.pyx", line 2521, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2158, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -90] NetCDF: file not found: 'https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr'

Anything else we need to know?

seemingly due to these lines:

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
return True

Environment

Details

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions