Skip to content

Add SRU Exception for NVIDIA CUDA#517

Open
antlassagne wants to merge 9 commits intoubuntu:mainfrom
antlassagne:main
Open

Add SRU Exception for NVIDIA CUDA#517
antlassagne wants to merge 9 commits intoubuntu:mainfrom
antlassagne:main

Conversation

@antlassagne
Copy link
Copy Markdown

This PR will require approval from an SRU team member

I will also update the [TBD] sections if an SRU team member approves.

Description

Add an SRU exception for the CUDA packages that we are newly delivering in 26.04. It's sets of 37+ prebuilt packages whose patches we will want to backport.

antlassagne and others added 8 commits March 11, 2026 15:57
* Add changelog instructions in the template
Co-authored-by: Andreas Glinserer <46827306+aglinserer@users.noreply.github.com>
Co-authored-by: Andreas Glinserer <46827306+aglinserer@users.noreply.github.com>
Co-authored-by: Andreas Glinserer <46827306+aglinserer@users.noreply.github.com>
@github-actions github-actions bot added the SRU For the attention of the SRU team label Mar 30, 2026
Copy link
Copy Markdown
Collaborator

@basak basak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

This isn't a full review, but some quick feedback for faster progress. Broadly this seems OK to me, with some things to clarify please:

  1. This should explicitly apply to only the multiverse archive component.
  2. SRUs into Ubuntu must not change existing behaviour. Like we did with the Intel graphics stack previously, we should have a public statement from upstream as to what "stable" means for them, ensure that this statement is compatible with our policy, and then document that with links to upstream's statement.
  3. SRUs into Ubuntu must not regress hardware support, regardless of upstream's view on support lifetimes (Ubuntu has its own). Again, like we did with the Intel graphics stack previously, we should have a public statement from upstream as to what "stable" means for them, ensure that this statement is compatible with our policy, and then document that with links to upstream's statement.

example for AI/ML. Canonical has a redistribution agreement with NVIDIA to
redistribute the CUDA libraries in the Ubuntu archive. Per the agreement, Canonical
must deliver the prebuilt binaries from NVIDIA without modifications, and follow
the same schedule than NVIDIA.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you expand on "follow the same schedule"? SRUs do not happen according to SLAs, so may lag.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And further, if we find a future SRU to not comply with our policies or user expectations, an SRU may become blocked, at which point it would fall behind "schedule" by definition.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your message. I can add a little more context about the schedule. The timeline is vague and I wrongly used the term 'schedule' for the versions: we committed to delivering 13-1, 13-2, 13-3, etc.

The good news is that minor versions are new source packages. Only the patches are subject to SRUs. CUDA patchs are minor, a lag is expected, especially if more discussions are required for a given patch set.

@antlassagne
Copy link
Copy Markdown
Author

Thank you for your review @basak

  1. Indeed, I'll mention it in the doc

  2. Can you link me to such a commitment from intel? I was not able to find the right link looking in the Intel-Graphics-Update exception.
    I think NVIDIA makes such commitments already:

  1. The above links are very much a commitment to that point, too.

A little example might be helpful.

Example:

13.1.1 release note can be found here. 13.2 is a new source package, it's only 13.1.1 that owuld have been subject to SRU (referred as Update 1).
If we watch libcusparse closely, we can read

3.4.1. cuSPARSE: Release 13.1 Update 1

New Features

    Added a new cusparseSpMVOp_bufferSize API that returns the size of the workspace buffer required for SpMVOp computations. Users provide this buffer when creating cusparseSpMVOpDescr_t, removing internal memory allocations.

    Improved SpMVOp performance on B200. [CUSPARSE-2931] [CUSPARSE-2932] [CUSPARSE-2933]

Resolved Issues

    Fixed an accuracy issue in mixed-precision CSR/COO SpMM computations. [CUSPARSE-2349]

    Fixed an issue in CSR SpMM computations when the input dense matrix has a high number of columns. [CUSPARSE-2301]

I assume this is OK in terms of SRU policy for a closed-source software? Anyway, if one day a given SRU does not comply, we can expect discussions. We can write down here that this exception is to SRU patches and bug fixes, not to blindly back-port anything that comes up under the CUDA name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

SRU For the attention of the SRU team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants