Reading netcdf files is slow if there are unlimited dimensions

Reading arrays from NetCDF files is slow if one dimension is unlimited.

An example: reading an array of shape `(5000, 50)` takes ~7 s if the first dimension is unlimited. If both dimensions are fixed, it takes ~0.02 s. This is a major bottleneck if many (100s) of such files need to be processed. Time dimension is often declared unlimited in files generated by circulation models.

Test case:
```
import iris
import time

f = 'example_dataset.nc'
var = 'sea_water_practical_salinity'

tic = time.process_time()
cube = iris.load_cube(f, var)
cube.data
duration = time.process_time() - tic
print('Duration {:.3f} s'.format(duration))
```

The input NetCDF file can be generated with:
```
import iris
import numpy
import datetime

ntime = 5000
nz = 50
dt = 600.
time = numpy.arange(ntime, dtype=float)*dt
date_zero = datetime.datetime(2000, 1, 1)
date_epoch = datetime.datetime.utcfromtimestamp(0)
time_epoch = time + (date_zero - date_epoch).total_seconds()
z = numpy.linspace(0, 10, nz)
values = 5*numpy.sin(time/(14*24*3600.))
values = numpy.tile(values, (nz, 1)).T

time_dim = iris.coords.DimCoord(time_epoch, standard_name='time',
                                units='seconds since 1970-01-01 00:00:00-00')
z_dim = iris.coords.DimCoord(z, standard_name='depth', units='m')
cube = iris.cube.Cube(values)
cube.standard_name = 'sea_water_practical_salinity'
cube.units = '1'
cube.add_dim_coord(time_dim, 0)
cube.add_dim_coord(z_dim, 1)
iris.fileformats.netcdf.save(cube, 'example_dataset.nc',
                             unlimited_dimensions=['time'])
```

Profiling suggest that in the unlimited case, each time slice is being read separately, i.e. `NetCDFDataProxy.__getitem__` is being called 5000 times.

Tested with: iris version 2.2.0, Anaconda3 2019.03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading netcdf files is slow if there are unlimited dimensions #3357

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading netcdf files is slow if there are unlimited dimensions #3357

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions