|
| 1 | +(migration-guide)= |
| 2 | + |
| 3 | +# Migration Guide |
| 4 | + |
| 5 | +This page is meant to help migrate your codebase to an Array API compliant |
| 6 | +implementation. The guide is divided into two parts and, depending on your |
| 7 | +exact use-case, you should look thoroughly into at least one of them. |
| 8 | + |
| 9 | +The first part is dedicated for {ref}`array-producers`. If your library |
| 10 | +mimics e.g. NumPy's or Dask's functionality, then you can find there an |
| 11 | +additional instructions and guidance on how to ensure downstream users can |
| 12 | +easily pick your solution as an array provider for their system/algorithm. |
| 13 | + |
| 14 | +The second part delves into details for Array API compatibility for |
| 15 | +{ref}`array-consumers`. This pertains to any software that performs |
| 16 | +multidimensional array manipulation in Python, such as: scikit-learn, SciPy, |
| 17 | +or statsmodels. If your software relies on a certain array producing library, |
| 18 | +such as NumPy or JAX, then here you can learn how to make it library agnostic |
| 19 | +and interchange them with way less friction. |
| 20 | + |
| 21 | +## Ecosystem |
| 22 | + |
| 23 | +Apart from the documented standard, the Array API ecosystem also provides |
| 24 | +a set of tools and packages to help you with the migration process: |
| 25 | + |
| 26 | + |
| 27 | +(array-api-compat)= |
| 28 | + |
| 29 | +### Array API Compat |
| 30 | + |
| 31 | +GitHub: [array-api-compat](https://github.com/data-apis/array-api-compat) |
| 32 | + |
| 33 | +Although NumPy, Dask, CuPy, and PyTorch support the Array API Standard, there |
| 34 | +are still some corner cases where their behavior diverges from the standard. |
| 35 | +`array-api-compat` provides a compatibility layer to cover these cases as well. |
| 36 | +This is also accompanied by a few utility functions for easier introspection |
| 37 | +into array objects. |
| 38 | + |
| 39 | + |
| 40 | +(array-api-strict)= |
| 41 | + |
| 42 | +### Array API Strict |
| 43 | + |
| 44 | +GitHub: [array-api-strict](https://github.com/data-apis/array-api-strict) |
| 45 | + |
| 46 | +`array-api-strict` is a library that provides a strict and minimal |
| 47 | +implementation of the Array API Standard. It is designed to be used as |
| 48 | +a reference implementation for testing and development purposes. By comparing |
| 49 | +your API calls with `array-api-strict` counterparts, you can ensure that your |
| 50 | +library is fully compliant with the standard and can serve as a reliable |
| 51 | +reference for other developers in the ecosystem. |
| 52 | + |
| 53 | + |
| 54 | +(array-api-tests)= |
| 55 | + |
| 56 | +### Array API Test |
| 57 | + |
| 58 | +GitHub: [array-api-tests](https://github.com/data-apis/array-api-tests) |
| 59 | + |
| 60 | +`array-api-tests` is a collection of tests that can be used to verify the |
| 61 | +compliance of your library with the Array API Standard. It includes tests |
| 62 | +for array producers, covering a wide range of functionalities and use cases. |
| 63 | +By running these tests, you can ensure that your library adheres to the |
| 64 | +standard and can be used with compatible array consumers libraries. |
| 65 | + |
| 66 | + |
| 67 | +(array-api-extra)= |
| 68 | + |
| 69 | +### Array API Extra |
| 70 | + |
| 71 | +GitHub: [array-api-extra](https://github.com/data-apis/array-api-extra) |
| 72 | + |
| 73 | +`array-api-extra` is a collection of additional utilities and tools that are |
| 74 | +missing from the Array API Standard but can be useful for compliant array |
| 75 | +consumers. It includes additional array manipulation and statistical functions. |
| 76 | +It is already used by SciPy and scikit-learn. |
| 77 | + |
| 78 | +The sections below mention when and how to use them. |
| 79 | + |
| 80 | + |
| 81 | +(array-producers)= |
| 82 | + |
| 83 | +## Array Producers |
| 84 | + |
| 85 | +For array producers, the central task during the development/migration process |
| 86 | +is adhering user-facing API to the Array API Standard. |
| 87 | + |
| 88 | +The complete API of the standard is documented on the |
| 89 | +[API specification](https://data-apis.org/array-api/latest/API_specification/index.html) |
| 90 | +page. |
| 91 | + |
| 92 | +There, each function, constant, and object is described with details |
| 93 | +on parameters, return values, and special cases. |
| 94 | + |
| 95 | +### Testing against Array API |
| 96 | + |
| 97 | +There are two main ways to test your API for compliance: Either using |
| 98 | +`array-api-tests` suite or testing your API manually against `array-api-strict` |
| 99 | +reference implementation. |
| 100 | + |
| 101 | +#### Array API Test suite (Recommended) |
| 102 | + |
| 103 | +{ref}`array-api-tests` is a test suite which verifies that your API |
| 104 | +for adhering to the standard. For each function or method it confirms |
| 105 | +it's importable, verifies the signature, and generates multiple test |
| 106 | +cases with hypothesis package and runs asserts for the outputs. |
| 107 | + |
| 108 | +The setup details are enclosed in the GitHub repository, so here we |
| 109 | +cover only the minimal workflow: |
| 110 | + |
| 111 | +1. Install your package, for example in editable mode. |
| 112 | +2. Clone `array-api-tests`, and set `ARRAY_API_TESTS_MODULE` environment |
| 113 | + variable to your package import name. |
| 114 | +3. Inside the `array-api-tests` directory run `pytest` command. There are |
| 115 | + multiple useful options delivered by the test suite, a few worth mentioning: |
| 116 | + - `--max-examples=2` - maximal number of test cases to generate by the |
| 117 | + hypothesis. This allows you to balance between execution time of the test |
| 118 | + suite and thoroughness of the testing. |
| 119 | + - With `--xfails-file` option you can describe which tests are expected to |
| 120 | + fail - it's impossible to get the whole API perfectly implemented on a |
| 121 | + first try, so tracking what still fails gives you more control over the |
| 122 | + state of your API. |
| 123 | + - `-o xfail_strict=<bool>` is often used with the previous one. If a test |
| 124 | + expected to fail actually passes (`XPASS`) then you can decide whether |
| 125 | + to ignore that fact or raise it as an error. |
| 126 | + - `--skips-file` for skipping files. At times some failing tests might stall |
| 127 | + the execution time of the test suite - in that case the most convenient |
| 128 | + option is to skip these for the time being. |
| 129 | + |
| 130 | +We strongly advise you to embed this setup in your CI as well. This will allow |
| 131 | +you to monitor the coverage live, and make sure new changes don't break existing |
| 132 | +API. For a reference here's a [NumPy Array API Tests CI setup](https://github.com/numpy/numpy/blob/581d10f43b539a189a2d37856e5130464de9e5f6/.github/workflows/linux.yml#L296). |
| 133 | + |
| 134 | + |
| 135 | +#### Array API Strict |
| 136 | + |
| 137 | +A simpler, and more manual, way of testing the Array API coverage is to |
| 138 | +run your API calls along with {ref}`array-api-strict` Python implementation. |
| 139 | + |
| 140 | +This way you can ensure the outputs coming from your API match the minimal |
| 141 | +reference implementation, but bare in mind you need to write the tests cases |
| 142 | +yourself, so you need to also take into account the edge cases as well. |
| 143 | + |
| 144 | + |
| 145 | +(array-consumers)= |
| 146 | + |
| 147 | +## Array Consumers |
| 148 | + |
| 149 | +For array consumers the main premise is keep in mind that your **array |
| 150 | +manipulation operations should not lock in for a particular array producing |
| 151 | +library**. For instance, if you use NumPy for arrays, then your code could |
| 152 | +contain: |
| 153 | + |
| 154 | +```python |
| 155 | +import numpy as np |
| 156 | + |
| 157 | +# ... |
| 158 | +b = np.full(shape, val, dtype=dtype) @ a |
| 159 | +c = np.mean(a, axis=0) |
| 160 | +return np.dot(c, b) |
| 161 | +``` |
| 162 | + |
| 163 | +The first step should be as simple as assigning `np` namespace to a dedicated |
| 164 | +namespace variable - the convention in the ecosystem is to name it `xp`. Then |
| 165 | +Making sure that each method and function call is something that Array API |
| 166 | +supports is vital (we will get to that soon): |
| 167 | + |
| 168 | +```python |
| 169 | +import numpy as np |
| 170 | + |
| 171 | +xp = np |
| 172 | + |
| 173 | +# ... |
| 174 | +b = xp.full(shape, val, dtype=dtype) @ a |
| 175 | +c = xp.mean(a, axis=0) |
| 176 | +return xp.tensordot(c, b, axes=1) |
| 177 | +``` |
| 178 | + |
| 179 | +Then replacing one backend with another one should rely on providing a different |
| 180 | +namespace, such as: `xp = torch`, e.g. via environment variable. This can be useful |
| 181 | +if you're writing a script or in your custom software. The other alternatives are: |
| 182 | + |
| 183 | +- If you are building a library where the backend is determined by input arrays |
| 184 | + passed by the end-user, then a recommended way is to ask your input arrays for a |
| 185 | + namespace to use: `xp = arr.__array_namespace__()` |
| 186 | +- Each function you implement can have a namespace `xp` as a parameter in the |
| 187 | + signature. Then enforcing inputs to be of type by the provided backend can be |
| 188 | + achieved with `arg1 = xp.asarray(arg1)` for each input array. |
| 189 | + |
| 190 | +If you're relying on NumPy, CuPy, PyTorch, Dask, or JAX then |
| 191 | +{ref}`array-api-compat` can come in handy for the transition. The compat layer |
| 192 | +allows you to still rely on your selection of array producing library, while |
| 193 | +making sure you're already using standard compatible API. Additionally, it |
| 194 | +offers a set of useful utility functions, such as: |
| 195 | + |
| 196 | +- [array_namespace()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.array_namespace) |
| 197 | + for fetching the namespace based on input arrays. |
| 198 | +- [is_array_api_obj()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.is_array_api_obj) |
| 199 | + for the introspection whether a given object is Array API compatible. |
| 200 | +- [device()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.device) |
| 201 | + to get a device the array resides on. |
| 202 | + |
| 203 | +For now the migration from a specific library (e.g. NumPy) to a standard compatible |
| 204 | +setup requires a manual intervention for each failing API call but in the future |
| 205 | +we plan to provide some automation tools for it. |
0 commit comments