Add Redundant Validation for Normalization Stats and Post-Inference Un-Normalization**


### 🚀 **Issue: Add Redundant Validation for Normalization Stats and Post-Inference Un-Normalization**

**Summary:**  
To ensure consistent and traceable data transformations in the ML workflow, we will implement redundant storage and validation of normalization statistics (`min`/`max`). These statistics will be used to **un-normalize the model output after inference**, ensuring consistency with the training-time standardization.

The normalization `.json` will also be consumed by a **separate pre-processing pipeline** responsible for preparing LR inference inputs. This issue covers only the un-normalization and validation logic.

----------

### 📋 **Tasks**

#### ✅ **Preprocessing Step (in [nc2pt](https://github.com/climagination/nc2pt)):**

-   Save normalization statistics to a `normalization.json` file for each variable during training preprocessing
    
    -   Fields: `min`, `max`, `variable`, `method`, `created`, etc.
        
-   Compute and include a hash (e.g. SHA256) of the JSON content to allow validation
    

#### ✅ **Model Export (TorchScript):**

-   Embed a copy of the normalization stats (and/or JSON hash) into the saved TorchScript model
    
    -   Either via metadata dict or as attributes on a scripted module
        

#### ✅ **Inference Step:**

-   Load the `normalization.json` file used for the variable
    
-   Load the normalization metadata from the TorchScript model
    
-   Validate that the loaded stats match those embedded in the model
    
    -   Value check or hash comparison
        
-   Apply **un-normalization** to the model-generated output:
    
``` python
    output_real_scale = output * (max - min) + min
```
-   Save the un-normalized output as `.zarr`
    

#### ✅ **Utilities & Docs:**

-   Add a `Standardizer` class or utility with `.denormalize()` and `.validate_against_model()` methods
    
-   Document expected `normalization.json` format and validation logic
    
-   Mention that **standardization of LR input is handled in a separate preprocessing codebase**
    

----------

### 🧠 Notes

-   Preprocessing of LR input (e.g., standardization and windowing) is handled externally in a separate codebase due to memory constraints.
    
-   This issue strictly handles:
    
    -   Emitting reusable normalization metadata
        
    -   Ensuring safe reuse during inference
        
    -   Applying post-inference un-normalization for saving HR results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Redundant Validation for Normalization Stats and Post-Inference Un-Normalization** #39

🚀 Issue: Add Redundant Validation for Normalization Stats and Post-Inference Un-Normalization

📋 Tasks

✅ Preprocessing Step (in nc2pt):

✅ Model Export (TorchScript):

✅ Inference Step:

✅ Utilities & Docs:

🧠 Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Redundant Validation for Normalization Stats and Post-Inference Un-Normalization** #39

Description

🚀 Issue: Add Redundant Validation for Normalization Stats and Post-Inference Un-Normalization

📋 Tasks

✅ Preprocessing Step (in nc2pt):

✅ Model Export (TorchScript):

✅ Inference Step:

✅ Utilities & Docs:

🧠 Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions