Skip to content

New Bonsai.ML.Pca.Torch Package for Batched and Online PCA#85

Open
ncguilbeault wants to merge 19 commits intobonsai-rx:mainfrom
ncguilbeault:dev/pca
Open

New Bonsai.ML.Pca.Torch Package for Batched and Online PCA#85
ncguilbeault wants to merge 19 commits intobonsai-rx:mainfrom
ncguilbeault:dev/pca

Conversation

@ncguilbeault
Copy link
Collaborator

This PR introduces a new PCA (Principal Component Analysis) module to the solution. The changes include new combinators for creating, fitting, and transforming PCA models, which include standard PCA, probablistic PCA, and Generalized Hebbian PCA for fitting online.

@glopesdev glopesdev self-requested a review March 12, 2026 07:26
Copy link
Member

@glopesdev glopesdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General looks good, I have some comments on the overall organization for abstracting the various operators which would be worth discussing to make possibly for a more consistent design across all packages.

/// <summary>
/// Gets a value indicating whether the model has been fitted to data.
/// </summary>
public bool IsFitted { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious why do we need this? Have you picked up on this pattern in torch or elsewhere? I would consider a model to always be a model, even if all parameters are initialized with zero or random values.

Whether a fit has been run to improve those parameters or not should not make much difference. A cursory search through the PR also didn't reveal any uses other than to throw an exception if Fit has not been called.

Is there any other reason? If not, my gut feeling is we should remove it.

/// <summary>
/// Gets or sets the PCA model.
/// </summary>
public IPcaBaseModel? Model { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to allow IPcaBaseModel to be nullable? Feels like a provider should always return a non-null model.

Also curious whether we really need the setter to be in the interface. We can have it in the concrete classes, but feels like the abstract interface might benefit from a more functional approach.

/// <summary>
/// Represents an operator that reconstructs the input data using a PCA model.
/// </summary>
[ResetCombinator]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the property requiring a reset is in the base class, we might want to move the ResetCombinator attribute there so all derived classes get it automatically.

/// <summary>
/// Represents an operator that transforms the input data using a PCA model.
/// </summary>
[ResetCombinator]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above for this operator.


if (_instance is CreatePca createPca)
{
var modelProperties = new HashSet<string>(createPca.GetModelProperties());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the reason for dynamically filtering these properties. Could we not simply filter out the ones we don't need by using [Browsable(false)]?

/// </summary>
[XmlIgnore]
[Description("The PCA model.")]
public IPcaBaseModel? Model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to expose the model as an operator, we could simply inherit here from CombinatorBuilder and let that class take care of the ground work. This would also make sure the model properties are serialized into the workflow XML, which may or may not be something desirable.

@glopesdev glopesdev added this to the v0.5.0 milestone Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants