Skip to content

user_defined_transform() fails when numpy functions are given in transform #45

@bobby-payne

Description

@bobby-payne

When running preprocessing with a user-defined transform that includes a numpy function, the job fails with: NameError: name 'np' is not defined.

The error originates from the function user_defined_transform() in computations.py, specifically where eval(transform) is called. When executed under Dask, np is not defined in the evaluation context because the worker process does not inherit the main process namespace.

Fix: add an explicit namespace to eval. For instance, replace eval(transform) with eval(transform, {"np": np, "x": x}). (Note that numpy is imported as np at the top of the file.)

    for transform in var.transform:
        try:
            x = 1.0  # noqa: F841
            eval(transform)  # x is implicitly a variable from the config
        except SyntaxError:
            raise SyntaxError(f"Invalid transform in config {transform}.")

        def func(x):
            return eval(transform)

        logging.info(f"🧮 Applying transform {transform} to {var.name}...")
        ds[var.name] = xr.apply_ufunc(func, ds[var.name], dask="parallelized")

    return ds

becomes...

    for transform in var.transform:
        try:
            x = 1.0
            eval(transform, {"np": np, "x": x})  # x is implicitly a variable from the config
        except SyntaxError:
            raise SyntaxError(f"Invalid transform in config {transform}.")

        def func(x):
            return eval(transform, {"np": np, "x": x})

        logging.info(f"🧮 Applying transform {transform} to {var.name}...")
        ds[var.name] = xr.apply_ufunc(func, ds[var.name], dask="parallelized")

    return ds

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions