New Bonsai.ML.Pca.Torch Package for Batched and Online PCA#85
New Bonsai.ML.Pca.Torch Package for Batched and Online PCA#85ncguilbeault wants to merge 19 commits intobonsai-rx:mainfrom
Bonsai.ML.Pca.Torch Package for Batched and Online PCA#85Conversation
… and move reorthogonalization logic directly inside the `Fit` method
… PCA models to use this
… in model subclasses
…) and improved support for validation
glopesdev
left a comment
There was a problem hiding this comment.
General looks good, I have some comments on the overall organization for abstracting the various operators which would be worth discussing to make possibly for a more consistent design across all packages.
| /// <summary> | ||
| /// Gets a value indicating whether the model has been fitted to data. | ||
| /// </summary> | ||
| public bool IsFitted { get; } |
There was a problem hiding this comment.
I was curious why do we need this? Have you picked up on this pattern in torch or elsewhere? I would consider a model to always be a model, even if all parameters are initialized with zero or random values.
Whether a fit has been run to improve those parameters or not should not make much difference. A cursory search through the PR also didn't reveal any uses other than to throw an exception if Fit has not been called.
Is there any other reason? If not, my gut feeling is we should remove it.
| /// <summary> | ||
| /// Gets or sets the PCA model. | ||
| /// </summary> | ||
| public IPcaBaseModel? Model { get; set; } |
There was a problem hiding this comment.
Is there a reason to allow IPcaBaseModel to be nullable? Feels like a provider should always return a non-null model.
Also curious whether we really need the setter to be in the interface. We can have it in the concrete classes, but feels like the abstract interface might benefit from a more functional approach.
| /// <summary> | ||
| /// Represents an operator that reconstructs the input data using a PCA model. | ||
| /// </summary> | ||
| [ResetCombinator] |
There was a problem hiding this comment.
Given the property requiring a reset is in the base class, we might want to move the ResetCombinator attribute there so all derived classes get it automatically.
| /// <summary> | ||
| /// Represents an operator that transforms the input data using a PCA model. | ||
| /// </summary> | ||
| [ResetCombinator] |
There was a problem hiding this comment.
Same as above for this operator.
|
|
||
| if (_instance is CreatePca createPca) | ||
| { | ||
| var modelProperties = new HashSet<string>(createPca.GetModelProperties()); |
There was a problem hiding this comment.
I'm not sure I understand the reason for dynamically filtering these properties. Could we not simply filter out the ones we don't need by using [Browsable(false)]?
| /// </summary> | ||
| [XmlIgnore] | ||
| [Description("The PCA model.")] | ||
| public IPcaBaseModel? Model |
There was a problem hiding this comment.
If the goal is to expose the model as an operator, we could simply inherit here from CombinatorBuilder and let that class take care of the ground work. This would also make sure the model properties are serialized into the workflow XML, which may or may not be something desirable.
This PR introduces a new PCA (Principal Component Analysis) module to the solution. The changes include new combinators for creating, fitting, and transforming PCA models, which include standard PCA, probablistic PCA, and Generalized Hebbian PCA for fitting online.