Skip to content

Move PCA and TSVD from cuml to raft#2952

Open
aamijar wants to merge 7 commits intorapidsai:mainfrom
aamijar:move-pca-from-cuml
Open

Move PCA and TSVD from cuml to raft#2952
aamijar wants to merge 7 commits intorapidsai:mainfrom
aamijar:move-pca-from-cuml

Conversation

@aamijar
Copy link
Member

@aamijar aamijar commented Feb 13, 2026

Required for rapidsai/cuvs#1207.

This PR moves pca.cuh, tsvd.cuh, and gtests into raft.

@aamijar aamijar requested review from a team as code owners February 13, 2026 09:09
@aamijar aamijar self-assigned this Feb 13, 2026
@aamijar aamijar added non-breaking Non-breaking change feature request New feature or request labels Feb 13, 2026

template <typename math_t, typename enum_solver = solver>
void truncCompExpVars(const raft::handle_t& handle,
math_t* in,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use mdspan here- we've deprecated all the pointer APIs.

math_t* input,
math_t* components,
math_t* explained_var,
math_t* explained_var_ratio,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input order should match the other (newer APIs). handle, params, input, output, free params. Also "stream" is in the handle now, and we use device_resources not raft::hande.

Copy link
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aamijar ! just a minor comment.
Question: will this be imported in cuvs and exposed as a python API?

/**
* @brief perform fit operation for the pca. Generates eigenvectors, explained vars, singular vals,
* etc.
* @param[in] handle: cuml handle object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc mentioning cuml handle, not raft device_resources! (same for other docs too)

@aamijar
Copy link
Member Author

aamijar commented Feb 14, 2026

Question: will this be imported in cuvs and exposed as a python API?

We will still have the same python and cpp apis in cuml too!
On the cuvs side I think the plan is to expose a cpp api.

@cjnolet
Copy link
Member

cjnolet commented Feb 14, 2026

@aamijar we will probably expose a preprocessing api through python for purposes of users who need to write scripts (for example Jinsol's new dataset gen requires PCA and it would be a circular dependency if we included cuml in cuVS) or have databases written in python.

But- like I mentioned to Simon, the users are very diffeeent between the two. Same thing with kmeans- kmeans clusters is the equivalent of "lexicograph ordering" in the vector world. Pca is another way to reduce footprint of vectors without losing quality.

Data science users will continue to use cuml. Vector databases will continue to use cuVS. It's important we don't duplicate code across the two... and since cuml is already using cuVS, it can continue to use the c++ api like you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

3 participants