Conversation
cpp/include/raft/linalg/pca.cuh
Outdated
|
|
||
| template <typename math_t, typename enum_solver = solver> | ||
| void truncCompExpVars(const raft::handle_t& handle, | ||
| math_t* in, |
There was a problem hiding this comment.
Need to use mdspan here- we've deprecated all the pointer APIs.
cpp/include/raft/linalg/pca.cuh
Outdated
| math_t* input, | ||
| math_t* components, | ||
| math_t* explained_var, | ||
| math_t* explained_var_ratio, |
There was a problem hiding this comment.
Input order should match the other (newer APIs). handle, params, input, output, free params. Also "stream" is in the handle now, and we use device_resources not raft::hande.
| /** | ||
| * @brief perform fit operation for the pca. Generates eigenvectors, explained vars, singular vals, | ||
| * etc. | ||
| * @param[in] handle: cuml handle object |
There was a problem hiding this comment.
doc mentioning cuml handle, not raft device_resources! (same for other docs too)
We will still have the same python and cpp apis in cuml too! |
|
@aamijar we will probably expose a preprocessing api through python for purposes of users who need to write scripts (for example Jinsol's new dataset gen requires PCA and it would be a circular dependency if we included cuml in cuVS) or have databases written in python. But- like I mentioned to Simon, the users are very diffeeent between the two. Same thing with kmeans- kmeans clusters is the equivalent of "lexicograph ordering" in the vector world. Pca is another way to reduce footprint of vectors without losing quality. Data science users will continue to use cuml. Vector databases will continue to use cuVS. It's important we don't duplicate code across the two... and since cuml is already using cuVS, it can continue to use the c++ api like you mentioned. |
Required for rapidsai/cuvs#1207.
This PR moves
pca.cuh,tsvd.cuh, and gtests into raft.