██████╗ ██╗██╗ ██╗ █████╗ ██╗
██╔══██╗██║╚██╗██╔╝██╔══██╗██║
██████╔╝██║ ╚███╔╝ ███████║██║
██╔═══╝ ██║ ██╔██╗ ██╔══██║██║
██║ ██║██╔╝ ██╗██║ ██║███████╗
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝
PIXAL – PIXel-based Anomaly Locator
PIXAL (PIXel-based Anomaly Locator) is a modular deep learning framework designed for image-based anomaly detection in high-resolution scientific data. Currently applied to identifying defects in detector hardware components for the ATLAS experiment, PIXAL supports training and validation of deep neural networks, with a focus on Autoencoder-based architectures.
The framework includes tools for:
- Image preprocessing, including background removal, alignment, zero-pruning, and ML input processing
- Flexible training with optional one-hot labels and configurable architectures
- Modular validation and anomaly visualization (heatmaps, ROC, loss histograms)
- Metadata tracking and reproducibility for experimental pipelines
PIXAL is highly extensible — other model types and preprocessing pipelines can be added with minimal changes.
- Setup
- Input Data Formatting
- Configuration System and Parameters
- Preprocessing Pipeline
- Model Training
- Validation and Detection
- How to Run
PIXAL is tested and works best with Python 3.10.9. For consistent results, we recommend creating a clean virtual environment with this version.
git clone https://github.com/OSU-HEP-HDL/pixal.git
cd pixal
source setup.sh
This script will:
-
Detect your platform (Linux, Windows via WSL or Git Bash, or macOS)
-
Create a Python virtual environment in .venv/
-
Activate the environment
-
Install required packages from requirements.txt or requirements-cpu.txt (macOS fallback)
-
Set up base configuration files
Note
For GPU training, ensure you have a compatible NVIDIA driver and CUDA/cuDNN stack installed. The framework is tested with TensorFlow 2.15+.
Important
Note for Windows users: Native Windows is not officially supported. Use WSL2 (Windows Subsystem for Linux) or Git Bash for best results.
Warning
Note for macOS users: Due to hardware and driver limitations, TensorFlow and related tools will run in CPU-only mode. Training and inference will be slower, but fully functional.
Check to see if the PIXAL framework was properly setup by running the help command.
pixal -h
Since components have different types of images, they should be separated in different directories that are labeled accordingly. The framework parses through nested folders and uses the naming convention for the output.
PIXAL uses modular YAML-based configuration files to define preprocessing steps, model training parameters, and all path resolutions. This design enables reproducibility, clarity, and easy experimentation.
There are two main configuration files that can be found within the /configs folder, they are parameters.yaml and paths.yaml.
The parameters.yaml file contains all high-level control flags. The file is split into three sections, preprocessing, model_training, and plotting.
Defines how images are cleaned and transformed:
- remove_background: Max workers are the number of threads for parallel processing when removing backgrounds from the images.
- alignment: parameters for KNN and RANSAC-based image alignment. Includes addtional metric and image flags.
- preprocessor: controls pooling, zero pruning, color channels, and .npz output.
- rename_images: optionally renames images to folder-consistent names.
Covers everything needed to build and train the neural network:
- Memory handling: GPU/CPU flags, threading, memory growth, and hybrid options.
- Architecture: latent layer size, encoder/decoder depth, label encoding, one-hot encoding flag.
- Training control: batch size, learning rate, optimizer settings, loss functions.
- Regularization: supports l1, l2, or combined with tunable coefficients.
- Early stopping: using patience and min_delta.
Choose what diagnostic plots to generate after training:
- ROC/Recall, pixel-wise MSE/MAE, distribution comparisons, confusion matrix, etc.
- Log-based vs absolute loss plotting.
- Loss cut threshold to define anomaly threshold
PIXAL resolves all data inputs/outputs relative to a few base directories. There are two main base paths, all preprocessing and model trainings are output to /out and all validation and detection are output to /validate. This YAML allows centralized control of:
component_model_path: where trained models and logs are saved.component_validate_path: path used during validation and detection.
The naming of these two sections are the only names the user should alter. Each section (like remove_background_path, aligned_images_path, etc.) defines a name and a base, which are combined at runtime using PIXAL’s recursive path resolution system.
aligned_images_path:
aligned_images: "aligned_images"
base: *preprocessed_images_path
This lets PIXAL dynamically build:
out/R0_Triplet_Data_Flex_F1_pink_prune_2pool_rgb/preprocessed_images/aligned_images
-
Hierarchical Namespacing: All configurations are parsed into nested Python namespaces (
config.preprocessing.preprocessor.pool_size, etc.) for intuitive access. -
Metadata: PIXAL automatically stores and saves parameters, including bounding box crop data from zero-pruning as metadata for use in validation.
-
Multi-file Merging: PIXAL merges multiple metadata YAMLs in a directory into one logical config object. These merged multiple YAMLs in a directory into one logical config object. This gives users separate reusable preprocessing.yaml, model_training.yaml, and plotting.yaml files while still combining them at runtime.
PIXAL includes a modular and efficient preprocessing pipeline designed to prepare image data for machine learning-based anomaly detection. The image shown is the front of the R0 Triplet Data Flex Flavor 1 which will be used as an example going through this pipeline, taken by a Tagarno Microscope. Below are the key stages:
Removes the background from each input image to isolate the object of interest. This is done using the rembg library with optional multithreaded support.
Purpose: Reduce noise and standardize input for feature extraction.
Config settings:
preprocessing:
remove_background:
max_workers: 8
rename_images: true
Output:
component/preprocessed_images/background_removed/
Aligns each background-removed image to a reference using feature matching (KNN, RANSAC). Ensures consistent orientation and spatial scale.
Purpose: Standardize object placement across the dataset.
Config settings:
preprocessing:
alignment:
knn_ratio: 0.8
number_of_points: 5
ransac_threshold: 7.0
MIN_SCORE_THRESHOLD: 0.5
MAX_MSE_THRESHOLD: 10.0
MIN_GOOD_MATCHES: 20
draw_matches: true
save_metrics: true
save_overlays: true
Output:
preprocessed_images/aligned_images/
figures/aligned_metrics/
Cropping step that removes zero-valued background pixels after alignment. The system finds the tightest bounding box around the non-zero pixels (with configurable padding) and crops all images to the same region.
Purpose: Reduce input dimensionality while preserving relevant information.
Config settings:
preprocessing:
preprocessor:
zero_pruning: true
zero_pruning_padding: 5
Output
Internally processed images; cropping dimensions are saved in:
metadata/preprocessing.yaml
Converts aligned (and optionally pruned) images into normalized ML-ready inputs. This includes:
- Channel selection can be any combination of (R, G, B, H, S, V)
- Average pooling to reduce resolution
- Per-channel normalization
- .npz output containing data, labels (if applicable), and shape
preprocessing:
preprocessor:
file_name: "out.npz"
pool_size: 2
channels: ["R", "G", "B"]
Output:
out/<component>/<type>/out.npz
Important parameters like crop_box, input_dim, and processing shapes are saved to:
out/<component>/<type>/metadata/preprocessing.yaml
PIXAL supports flexible and modular training of deep learning models (currently autoencoders) for anomaly detection in pixel-aligned image data.
An Autoencoder is a type of neural network that learns to compress and reconstruct its input. It's structured into three parts:
- Encoder: Compresses the input image into a smaller latent representation. This part captures the most essential features of the data.
- Latent Space: The compressed representation. It’s the "bottleneck" that forces the network to learn meaningful features.
- Decoder: Attempts to reconstruct the original image from the latent representation.
In the context of PIXAL, this model learns to reproduce defect-free components. During validation, poor reconstruction (i.e., higher pixel-wise loss) indicates anomalous or defective regions.
Before training, images must be preprocessed and converted into .npz files using the preprocessing pipeline (see previous section). Each .npz file contains:
data: flattened, normalized image vectorslabels: (only if using one-hot encoding)shape: original image shape post-pooling or zero-pruning
PIXAL supports two training modes:
Trains a separate model for each image type (e.g. component variant or class). Each .npz file corresponds to a single type.
model_training:
one_hot_encoding: False
-
Benefits: Higher performance, more specific models
-
Model Output: The model is saved both as a
.kerasfile and its weights as<model_name>.weights.h5, these can be found in:out/<component>/<type>/model/<model_name>.weights.h5Currently, models are loaded and rebuilt using the<model_name>.weights.h5for validation.
Trains a single model on all types of images, with one-hot encoded class labels appended to the latent space.
model_training:
one_hot_encoding: True
- Benefits: Generalized model across types
- Model Output:
Just as the per-type mode, the model is saved both as a
.kerasfile and its weights as<model_name>.weights.h5, these can be found in:out/<component>/model/<model_name>.weights.h5Currently, models are loaded and rebuilt using the<model_name>.weights.h5for validation.
Once a model is trained, PIXAL performs validation and anomaly detection by comparing reconstructed images to their input counterparts. Deviations between the input and reconstruction indicate potential anomalies (e.g., damaged hardware regions).
The validation process mirrors the preprocessing and training workflow:
- A new directory of unseen images (e.g., from a production batch) is passed into the validation routine.
- These images are organized in per-type folders (if one_hot_encoding=False) or as a flat directory (if True).
- Background removal
- Image alignment (using previously saved reference images)
- Zero pruning using pre-saved crop box metadata
- Normalization & pooling
- Conversion into .npz format
- Each .npz file is paired with its trained model and metadata (architecture, crop box, etc.).
- Model is rebuilt and weights are loaded.
- The model reconstructs the input image(s).
- The reconstruction is compared to the original input to compute pixel-wise reconstruction errors.
PIXAL uses the Mean Squared Error (MSE) between input and reconstruction to assess anomalies.
- Low MSE → normal reconstruction
- High MSE → possible anomaly
You can configure:
plotting:
loss_cut: 0.7 # Threshold for anomaly
use_log_loss: False # Use log-scale loss when computing anomaly mask
For each validated image type, PIXAL saves:
validate/
└── <component>/
└── <type>/
├── logs/
├── metadata/
├── figures/
│ ├── anomaly_overlay_*.png
│ ├── pixel_loss_histogram.png
│ └── ...
└── aligned_metrics/
Visual outputs include:
| Output | Description |
|---|---|
anomaly_overlay_*.png |
Heatmap of pixel-wise anomaly regions |
pixel_loss_histogram.png |
Histogram of MSE across all pixels |
combined_distribution_log.png |
Overlay of predicted and true pixel values |
roc_curve, pr_curve |
ROC/PR curve using pixel-wise MSE scores |
confusion_matrix.png |
Optional confusion matrix (if thresholds used) |
The commands to run PIXAL are streamlined to reduce the amount of input of the user. The commands arguments can be manually inputted, if not, it will follow the paths.yaml configuration file to find the relevant files used for the process.
Important
Prior to preprocessing your dataset, alter the section component_model_path: &component_model_path in the paths.yaml file to match your component name
The commands included in the PIXAL framework can be seen using the -h
Pixel-based Anomaly Detection CLI
positional arguments:
{preprocess,remove_bg,align,make_input,train,validate,detect}
preprocess Run all preprocessing steps on input images
remove_bg Remove background from images
align Align images
make_input Uses ImagePreprocessor to make ML input
train Train autoencoder model(s)
validate Run validation (preprocess + detect) on new images
detect Run anomaly detection on new images
options:
-h, --help show this help message and exit
The preprocessing pipeline is included in a single command, but each step can be ran separately if needed. Ensure the dataset and the nested directories are properly named prior to running. To run the entire pipeline:
pixal preprocess -i /path/to/component/
Loading bars are shown for each preprocessing step.
If separate steps are needed to be ran, make sure to use the proper input for an argument.
pixal align -i /path/to/remove_bg/images/
The train command can take in input or assume you're training a model based on the preprocessed input dictated by the paths.yaml configuration file. If it's safe to assume you're using this preprocessed data, you can just run:
pixal train
Otherwise,
pixal train -i /path/to/preprocessed/data/
Validation preprocesses the image that needs to be validated while also running and production the detection plots.
Important
Prior to validating your image, alter the section ccomponent_validate_path: &component_validate_path in the paths.yaml file to match your component name
To run the validation pipeline, run:
pixal validate -i /path/to/image/
PIXAL currently writes trained models to the out/.../model/ directory in two forms:
- Full Keras checkpoint file (
.keras) — this is the checkpoint file written by the KerasModelCheckpointcallback during training. It is suitable for restoring model weights viamodel.load_weights(...)or for Keras to read back a saved model depending on how it was written. - Weights file (
<model_name>.weights.h5) — the project currently also callssave_weights(...)after training; validation and downstream code rebuilds the model architecture and then callsload_weights(...)to restore weights.
Example file locations:
out/<component>/<type>/model/<model_name>.keras
out/<component>/<type>/model/<model_name>.weights.h5
Loading the model for validation (current behavior)
from pixal.train_model.autoencoder import Autoencoder
# build same `params` dict used for training
model = Autoencoder(params)
model.build_model(input_dim=params['input_dim'])
model.load_weights(str(weights_path))MLflow quick-start (optional)
MLflow is now installed with setup.sh
By default MLflow will create an mlruns/ directory in the current working directory. To use a remote tracking server, set:
PIXAL includes a small, best-effort integration helper at pixal/mlflow_utils.py. When mlflow is present, training runs (via the pixal train entrypoints) will:
- Start an MLflow run and log the
paramsdictionary as run parameters - Log per-epoch metrics (loss/val_loss) to MLflow
- After training, log the saved model/weights and the metadata YAML as artifacts
To start a local server, run:
mlflow ui
This automatically runs a MLflow server on port 5000.
Go to "http://your-mlflow-server:5000" or however you access port 5000.
The preprocessing framework supports multiple representations of image pixel values. You can specify a subset (e.g., ["H","S","V"]) or request all variables by passing channels="ALL". Each variable is normalized to a consistent range (typically [0,1]) to simplify downstream training.
Below is a detailed description of the currently available feature variables:
R, G, B — Red, Green, Blue channels (linear or gamma-corrected depending on preprocessing). Each channel contains per-pixel intensity values normalized to [0, 1].
H, S, V — Hue, Saturation, Value from HSV colorspace. H is represented as degrees mapped to [0,1] (or optionally as sin/cos pairs in derived channels), S and V are normalized to [0,1].
Y, Cr, Cb — YCbCr colorspace channels: luminance (Y) and chroma components (Cr, Cb). Useful for separating brightness from color information.
LAB_L, LAB_a, LAB_b — CIE LAB color space channels: lightness (L) and opponent color axes (a, b). These are perceptually uniform channels useful when color differences should match human perception.
LCh_C, LCh_sinH, LCh_cosH — Polar form of Lab: chroma (C) and hue angle encoded as sin(H) and cos(H) for continuous, wrap-safe representation of hue.
r_chroma, g_chroma — Simple chroma-derived channels computed relative to the R and G channels (e.g., R / (R+G+B + eps)). These emphasize the contribution of a single color channel relative to total intensity.
Opp_O1, Opp_O2, Opp_O3 — Color-opponent channels (e.g., variants of R-G, R-B, G-B or other opponent transforms). Opponent channels help highlight color contrasts that may indicate defects.
GradMag — Gradient magnitude of the luminance or chosen channel (e.g., Sobel magnitude). Useful for edge and texture information.
Laplacian — Laplacian filter response (second derivative) capturing blob-like intensity variations and helping detect local defects or spots.
LocalStd — Local (neighborhood) standard deviation of intensity (texture measure). Useful to capture local texture variance and noise.
Notes and tips
- You can request specific channels using the
channelspreprocessing option, e.g.channels: ["H","S","V","GradMag"]. - For hue information, prefer
LCh_sinH/LCh_cosHorHwrapped as sin/cos to avoid discontinuities near 0/360 degrees. - If using multiple chroma/opponent channels, normalize each channel independently (already performed by PIXAL) so the network can weigh them fairly.
- Derived channels (
GradMag,Laplacian,LocalStd) add texture/structure information and are especially helpful for detecting small or low-contrast defects.





