Reading arrays from NetCDF files is slow if one dimension is unlimited.
An example: reading an array of shape (5000, 50) takes ~7 s if the first dimension is unlimited. If both dimensions are fixed, it takes ~0.02 s. This is a major bottleneck if many (100s) of such files need to be processed. Time dimension is often declared unlimited in files generated by circulation models.
Test case:
import iris
import time
f = 'example_dataset.nc'
var = 'sea_water_practical_salinity'
tic = time.process_time()
cube = iris.load_cube(f, var)
cube.data
duration = time.process_time() - tic
print('Duration {:.3f} s'.format(duration))
The input NetCDF file can be generated with:
import iris
import numpy
import datetime
ntime = 5000
nz = 50
dt = 600.
time = numpy.arange(ntime, dtype=float)*dt
date_zero = datetime.datetime(2000, 1, 1)
date_epoch = datetime.datetime.utcfromtimestamp(0)
time_epoch = time + (date_zero - date_epoch).total_seconds()
z = numpy.linspace(0, 10, nz)
values = 5*numpy.sin(time/(14*24*3600.))
values = numpy.tile(values, (nz, 1)).T
time_dim = iris.coords.DimCoord(time_epoch, standard_name='time',
units='seconds since 1970-01-01 00:00:00-00')
z_dim = iris.coords.DimCoord(z, standard_name='depth', units='m')
cube = iris.cube.Cube(values)
cube.standard_name = 'sea_water_practical_salinity'
cube.units = '1'
cube.add_dim_coord(time_dim, 0)
cube.add_dim_coord(z_dim, 1)
iris.fileformats.netcdf.save(cube, 'example_dataset.nc',
unlimited_dimensions=['time'])
Profiling suggest that in the unlimited case, each time slice is being read separately, i.e. NetCDFDataProxy.__getitem__ is being called 5000 times.
Tested with: iris version 2.2.0, Anaconda3 2019.03
Reading arrays from NetCDF files is slow if one dimension is unlimited.
An example: reading an array of shape
(5000, 50)takes ~7 s if the first dimension is unlimited. If both dimensions are fixed, it takes ~0.02 s. This is a major bottleneck if many (100s) of such files need to be processed. Time dimension is often declared unlimited in files generated by circulation models.Test case:
The input NetCDF file can be generated with:
Profiling suggest that in the unlimited case, each time slice is being read separately, i.e.
NetCDFDataProxy.__getitem__is being called 5000 times.Tested with: iris version 2.2.0, Anaconda3 2019.03