-
Notifications
You must be signed in to change notification settings - Fork 1
Description
🚀 Issue: Add Redundant Validation for Normalization Stats and Post-Inference Un-Normalization
Summary:
To ensure consistent and traceable data transformations in the ML workflow, we will implement redundant storage and validation of normalization statistics (min/max). These statistics will be used to un-normalize the model output after inference, ensuring consistency with the training-time standardization.
The normalization .json will also be consumed by a separate pre-processing pipeline responsible for preparing LR inference inputs. This issue covers only the un-normalization and validation logic.
📋 Tasks
✅ Preprocessing Step (in nc2pt):
-
Save normalization statistics to a
normalization.jsonfile for each variable during training preprocessing- Fields:
min,max,variable,method,created, etc.
- Fields:
-
Compute and include a hash (e.g. SHA256) of the JSON content to allow validation
✅ Model Export (TorchScript):
-
Embed a copy of the normalization stats (and/or JSON hash) into the saved TorchScript model
- Either via metadata dict or as attributes on a scripted module
✅ Inference Step:
-
Load the
normalization.jsonfile used for the variable -
Load the normalization metadata from the TorchScript model
-
Validate that the loaded stats match those embedded in the model
- Value check or hash comparison
-
Apply un-normalization to the model-generated output:
output_real_scale = output * (max - min) + min- Save the un-normalized output as
.zarr
✅ Utilities & Docs:
-
Add a
Standardizerclass or utility with.denormalize()and.validate_against_model()methods -
Document expected
normalization.jsonformat and validation logic -
Mention that standardization of LR input is handled in a separate preprocessing codebase
🧠 Notes
-
Preprocessing of LR input (e.g., standardization and windowing) is handled externally in a separate codebase due to memory constraints.
-
This issue strictly handles:
-
Emitting reusable normalization metadata
-
Ensuring safe reuse during inference
-
Applying post-inference un-normalization for saving HR results
-