Skip to content

Jacobecrosby/pixal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ██████╗ ██╗██╗  ██╗ █████╗ ██╗   
 ██╔══██╗██║╚██╗██╔╝██╔══██╗██║     
 ██████╔╝██║ ╚███╔╝ ███████║██║     
 ██╔═══╝ ██║ ██╔██╗ ██╔══██║██║     
 ██║     ██║██╔╝ ██╗██║  ██║███████╗ 
 ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝╚══════╝
  PIXAL – PIXel-based Anomaly Locator

PIXAL (PIXel-based Anomaly Locator) is a modular deep learning framework designed for image-based anomaly detection in high-resolution scientific data. Currently applied to identifying defects in detector hardware components for the ATLAS experiment, PIXAL supports training and validation of deep neural networks, with a focus on Autoencoder-based architectures.

The framework includes tools for:

  • Image preprocessing, including background removal, alignment, zero-pruning, and ML input processing
  • Flexible training with optional one-hot labels and configurable architectures
  • Modular validation and anomaly visualization (heatmaps, ROC, loss histograms)
  • Metadata tracking and reproducibility for experimental pipelines

PIXAL is highly extensible — other model types and preprocessing pipelines can be added with minimal changes.

Table of Contents

Setup

PIXAL is tested and works best with Python 3.10.9. For consistent results, we recommend creating a clean virtual environment with this version.

1. Clone the Repository

git clone https://github.com/OSU-HEP-HDL/pixal.git
cd pixal

2. Setup the Environment

source setup.sh

This script will:

  • Detect your platform (Linux, Windows via WSL or Git Bash, or macOS)

  • Create a Python virtual environment in .venv/

  • Activate the environment

  • Install required packages from requirements.txt or requirements-cpu.txt (macOS fallback)

  • Set up base configuration files

Note

For GPU training, ensure you have a compatible NVIDIA driver and CUDA/cuDNN stack installed. The framework is tested with TensorFlow 2.15+.

Important

Note for Windows users: Native Windows is not officially supported. Use WSL2 (Windows Subsystem for Linux) or Git Bash for best results.

Warning

Note for macOS users: Due to hardware and driver limitations, TensorFlow and related tools will run in CPU-only mode. Training and inference will be slower, but fully functional.

3. Verify the Environment

Check to see if the PIXAL framework was properly setup by running the help command.

pixal -h

Input Data Formatting

Since components have different types of images, they should be separated in different directories that are labeled accordingly. The framework parses through nested folders and uses the naming convention for the output.

Diagram of nested directories for the R0 Triplet Data Flex F1, showing input directory for preprocessing

Configuration System and Parameters

PIXAL uses modular YAML-based configuration files to define preprocessing steps, model training parameters, and all path resolutions. This design enables reproducibility, clarity, and easy experimentation. There are two main configuration files that can be found within the /configs folder, they are parameters.yaml and paths.yaml.

Parameters

The parameters.yaml file contains all high-level control flags. The file is split into three sections, preprocessing, model_training, and plotting.

Preprocessing

Defines how images are cleaned and transformed:

  • remove_background: Max workers are the number of threads for parallel processing when removing backgrounds from the images.
  • alignment: parameters for KNN and RANSAC-based image alignment. Includes addtional metric and image flags.
  • preprocessor: controls pooling, zero pruning, color channels, and .npz output.
  • rename_images: optionally renames images to folder-consistent names.

Model Training

Covers everything needed to build and train the neural network:

  • Memory handling: GPU/CPU flags, threading, memory growth, and hybrid options.
  • Architecture: latent layer size, encoder/decoder depth, label encoding, one-hot encoding flag.
  • Training control: batch size, learning rate, optimizer settings, loss functions.
  • Regularization: supports l1, l2, or combined with tunable coefficients.
  • Early stopping: using patience and min_delta.

Plotting

Choose what diagnostic plots to generate after training:

  • ROC/Recall, pixel-wise MSE/MAE, distribution comparisons, confusion matrix, etc.
  • Log-based vs absolute loss plotting.
  • Loss cut threshold to define anomaly threshold

Paths

PIXAL resolves all data inputs/outputs relative to a few base directories. There are two main base paths, all preprocessing and model trainings are output to /out and all validation and detection are output to /validate. This YAML allows centralized control of:

  • component_model_path: where trained models and logs are saved.
  • component_validate_path: path used during validation and detection.

The naming of these two sections are the only names the user should alter. Each section (like remove_background_path, aligned_images_path, etc.) defines a name and a base, which are combined at runtime using PIXAL’s recursive path resolution system.

Example

aligned_images_path:
  aligned_images: "aligned_images"
  base: *preprocessed_images_path

This lets PIXAL dynamically build:

out/R0_Triplet_Data_Flex_F1_pink_prune_2pool_rgb/preprocessed_images/aligned_images

Advanced Behavior

  • Hierarchical Namespacing: All configurations are parsed into nested Python namespaces (config.preprocessing.preprocessor.pool_size, etc.) for intuitive access.

  • Metadata: PIXAL automatically stores and saves parameters, including bounding box crop data from zero-pruning as metadata for use in validation.

  • Multi-file Merging: PIXAL merges multiple metadata YAMLs in a directory into one logical config object. These merged multiple YAMLs in a directory into one logical config object. This gives users separate reusable preprocessing.yaml, model_training.yaml, and plotting.yaml files while still combining them at runtime.

Preprocessing Pipeline

PIXAL includes a modular and efficient preprocessing pipeline designed to prepare image data for machine learning-based anomaly detection. The image shown is the front of the R0 Triplet Data Flex Flavor 1 which will be used as an example going through this pipeline, taken by a Tagarno Microscope. Below are the key stages:

R0 Triplet Data Flex Flavor 1 front with no preprocessing

Background Removal

Removes the background from each input image to isolate the object of interest. This is done using the rembg library with optional multithreaded support.

Purpose: Reduce noise and standardize input for feature extraction.

Config settings:

preprocessing:
  remove_background:
    max_workers: 8
  rename_images: true

Output: component/preprocessed_images/background_removed/

R0 Triplet Data Flex Flavor 1 front with its background removed

Image Alignment

Aligns each background-removed image to a reference using feature matching (KNN, RANSAC). Ensures consistent orientation and spatial scale.

Purpose: Standardize object placement across the dataset.

Config settings:

preprocessing:
  alignment:
    knn_ratio: 0.8
    number_of_points: 5
    ransac_threshold: 7.0
    MIN_SCORE_THRESHOLD: 0.5
    MAX_MSE_THRESHOLD: 10.0
    MIN_GOOD_MATCHES: 20
  draw_matches: true
  save_metrics: true
  save_overlays: true

Output: preprocessed_images/aligned_images/ figures/aligned_metrics/

Two R0 Triplet Data Flexes Flavor 1 showing 10 matching points found using KNN

Zero Pruning (Optional)

Cropping step that removes zero-valued background pixels after alignment. The system finds the tightest bounding box around the non-zero pixels (with configurable padding) and crops all images to the same region.

Purpose: Reduce input dimensionality while preserving relevant information.

Config settings:

preprocessing:
  preprocessor:
    zero_pruning: true
    zero_pruning_padding: 5

Output Internally processed images; cropping dimensions are saved in: metadata/preprocessing.yaml

R0 Triplet Data Flexes Flavor 1 after zero pruning

Preprocesor -> ML Input Conversion

Converts aligned (and optionally pruned) images into normalized ML-ready inputs. This includes:

  • Channel selection can be any combination of (R, G, B, H, S, V)
  • Average pooling to reduce resolution
  • Per-channel normalization
  • .npz output containing data, labels (if applicable), and shape
preprocessing:
  preprocessor:
    file_name: "out.npz"
    pool_size: 2
    channels: ["R", "G", "B"]

Output: out/<component>/<type>/out.npz

Metadata Output

Important parameters like crop_box, input_dim, and processing shapes are saved to: out/<component>/<type>/metadata/preprocessing.yaml

Model Training

PIXAL supports flexible and modular training of deep learning models (currently autoencoders) for anomaly detection in pixel-aligned image data.

The Autoencoder Architecture

An Autoencoder is a type of neural network that learns to compress and reconstruct its input. It's structured into three parts:

  • Encoder: Compresses the input image into a smaller latent representation. This part captures the most essential features of the data.
  • Latent Space: The compressed representation. It’s the "bottleneck" that forces the network to learn meaningful features.
  • Decoder: Attempts to reconstruct the original image from the latent representation.

In the context of PIXAL, this model learns to reproduce defect-free components. During validation, poor reconstruction (i.e., higher pixel-wise loss) indicates anomalous or defective regions.

R0 Triplet Data Flexes Flavor 1 after zero pruning

Input Format

Before training, images must be preprocessed and converted into .npz files using the preprocessing pipeline (see previous section). Each .npz file contains:

  • data: flattened, normalized image vectors
  • labels: (only if using one-hot encoding)
  • shape: original image shape post-pooling or zero-pruning

Training Modes

PIXAL supports two training modes:

1. Per-Type Model (default)

Trains a separate model for each image type (e.g. component variant or class). Each .npz file corresponds to a single type.

model_training:
  one_hot_encoding: False
  • Benefits: Higher performance, more specific models

  • Model Output: The model is saved both as a .keras file and its weights as <model_name>.weights.h5, these can be found in: out/<component>/<type>/model/<model_name>.weights.h5 Currently, models are loaded and rebuilt using the <model_name>.weights.h5 for validation.

2. One-Hot Encoding Mode

Trains a single model on all types of images, with one-hot encoded class labels appended to the latent space.

model_training:
  one_hot_encoding: True
  • Benefits: Generalized model across types
  • Model Output: Just as the per-type mode, the model is saved both as a .keras file and its weights as <model_name>.weights.h5, these can be found in: out/<component>/model/<model_name>.weights.h5 Currently, models are loaded and rebuilt using the <model_name>.weights.h5 for validation.

Validation and Anomaly Detection

Once a model is trained, PIXAL performs validation and anomaly detection by comparing reconstructed images to their input counterparts. Deviations between the input and reconstruction indicate potential anomalies (e.g., damaged hardware regions).

Validation Workflow

The validation process mirrors the preprocessing and training workflow:

1. New Image Set

  • A new directory of unseen images (e.g., from a production batch) is passed into the validation routine.
  • These images are organized in per-type folders (if one_hot_encoding=False) or as a flat directory (if True).

2. Preprocessing

  • Background removal
  • Image alignment (using previously saved reference images)
  • Zero pruning using pre-saved crop box metadata
  • Normalization & pooling
  • Conversion into .npz format

3. Model Selection

  • Each .npz file is paired with its trained model and metadata (architecture, crop box, etc.).
  • Model is rebuilt and weights are loaded.

4. Prediction

  • The model reconstructs the input image(s).
  • The reconstruction is compared to the original input to compute pixel-wise reconstruction errors.

Detection Logic

PIXAL uses the Mean Squared Error (MSE) between input and reconstruction to assess anomalies.

  • Low MSE → normal reconstruction
  • High MSE → possible anomaly

You can configure:

plotting:
  loss_cut: 0.7              # Threshold for anomaly
  use_log_loss: False        # Use log-scale loss when computing anomaly mask

Detection Output

For each validated image type, PIXAL saves:

validate/
  └── <component>/
      └── <type>/
          ├── logs/
          ├── metadata/
          ├── figures/
          │   ├── anomaly_overlay_*.png
          │   ├── pixel_loss_histogram.png
          │   └── ...
          └── aligned_metrics/

Visual outputs include:

Output Description
anomaly_overlay_*.png Heatmap of pixel-wise anomaly regions
pixel_loss_histogram.png Histogram of MSE across all pixels
combined_distribution_log.png Overlay of predicted and true pixel values
roc_curve, pr_curve ROC/PR curve using pixel-wise MSE scores
confusion_matrix.png Optional confusion matrix (if thresholds used)

How to Run PIXAL

The commands to run PIXAL are streamlined to reduce the amount of input of the user. The commands arguments can be manually inputted, if not, it will follow the paths.yaml configuration file to find the relevant files used for the process.

Important

Prior to preprocessing your dataset, alter the section component_model_path: &component_model_path in the paths.yaml file to match your component name

The commands included in the PIXAL framework can be seen using the -h

Pixel-based Anomaly Detection CLI

positional arguments:
  {preprocess,remove_bg,align,make_input,train,validate,detect}
    preprocess          Run all preprocessing steps on input images
    remove_bg           Remove background from images
    align               Align images
    make_input          Uses ImagePreprocessor to make ML input
    train               Train autoencoder model(s)
    validate            Run validation (preprocess + detect) on new images
    detect              Run anomaly detection on new images

options:
  -h, --help            show this help message and exit

Preprocessing

The preprocessing pipeline is included in a single command, but each step can be ran separately if needed. Ensure the dataset and the nested directories are properly named prior to running. To run the entire pipeline:

pixal preprocess -i /path/to/component/

Loading bars are shown for each preprocessing step.

If separate steps are needed to be ran, make sure to use the proper input for an argument.

pixal align -i /path/to/remove_bg/images/

Training

The train command can take in input or assume you're training a model based on the preprocessed input dictated by the paths.yaml configuration file. If it's safe to assume you're using this preprocessed data, you can just run:

pixal train

Otherwise,

pixal train -i /path/to/preprocessed/data/

Validation

Validation preprocesses the image that needs to be validated while also running and production the detection plots.

Important

Prior to validating your image, alter the section ccomponent_validate_path: &component_validate_path in the paths.yaml file to match your component name

To run the validation pipeline, run:

pixal validate -i /path/to/image/

Model saving & MLflow integration

PIXAL currently writes trained models to the out/.../model/ directory in two forms:

  • Full Keras checkpoint file (.keras) — this is the checkpoint file written by the Keras ModelCheckpoint callback during training. It is suitable for restoring model weights via model.load_weights(...) or for Keras to read back a saved model depending on how it was written.
  • Weights file (<model_name>.weights.h5) — the project currently also calls save_weights(...) after training; validation and downstream code rebuilds the model architecture and then calls load_weights(...) to restore weights.

Example file locations:

out/<component>/<type>/model/<model_name>.keras
out/<component>/<type>/model/<model_name>.weights.h5

Loading the model for validation (current behavior)

from pixal.train_model.autoencoder import Autoencoder

# build same `params` dict used for training
model = Autoencoder(params)
model.build_model(input_dim=params['input_dim'])
model.load_weights(str(weights_path))

MLflow quick-start (optional)

MLflow is now installed with setup.sh

By default MLflow will create an mlruns/ directory in the current working directory. To use a remote tracking server, set:

PIXAL includes a small, best-effort integration helper at pixal/mlflow_utils.py. When mlflow is present, training runs (via the pixal train entrypoints) will:

  • Start an MLflow run and log the params dictionary as run parameters
  • Log per-epoch metrics (loss/val_loss) to MLflow
  • After training, log the saved model/weights and the metadata YAML as artifacts

To start a local server, run:

mlflow ui

This automatically runs a MLflow server on port 5000.

Go to "http://your-mlflow-server:5000" or however you access port 5000.

Available Variables for ML Input

The preprocessing framework supports multiple representations of image pixel values. You can specify a subset (e.g., ["H","S","V"]) or request all variables by passing channels="ALL". Each variable is normalized to a consistent range (typically [0,1]) to simplify downstream training.

Below is a detailed description of the currently available feature variables: R, G, B — Red, Green, Blue channels (linear or gamma-corrected depending on preprocessing). Each channel contains per-pixel intensity values normalized to [0, 1].

H, S, V — Hue, Saturation, Value from HSV colorspace. H is represented as degrees mapped to [0,1] (or optionally as sin/cos pairs in derived channels), S and V are normalized to [0,1].

Y, Cr, Cb — YCbCr colorspace channels: luminance (Y) and chroma components (Cr, Cb). Useful for separating brightness from color information.

LAB_L, LAB_a, LAB_b — CIE LAB color space channels: lightness (L) and opponent color axes (a, b). These are perceptually uniform channels useful when color differences should match human perception.

LCh_C, LCh_sinH, LCh_cosH — Polar form of Lab: chroma (C) and hue angle encoded as sin(H) and cos(H) for continuous, wrap-safe representation of hue.

r_chroma, g_chroma — Simple chroma-derived channels computed relative to the R and G channels (e.g., R / (R+G+B + eps)). These emphasize the contribution of a single color channel relative to total intensity.

Opp_O1, Opp_O2, Opp_O3 — Color-opponent channels (e.g., variants of R-G, R-B, G-B or other opponent transforms). Opponent channels help highlight color contrasts that may indicate defects.

GradMag — Gradient magnitude of the luminance or chosen channel (e.g., Sobel magnitude). Useful for edge and texture information.

Laplacian — Laplacian filter response (second derivative) capturing blob-like intensity variations and helping detect local defects or spots.

LocalStd — Local (neighborhood) standard deviation of intensity (texture measure). Useful to capture local texture variance and noise.

Notes and tips

  • You can request specific channels using the channels preprocessing option, e.g. channels: ["H","S","V","GradMag"].
  • For hue information, prefer LCh_sinH/LCh_cosH or H wrapped as sin/cos to avoid discontinuities near 0/360 degrees.
  • If using multiple chroma/opponent channels, normalize each channel independently (already performed by PIXAL) so the network can weigh them fairly.
  • Derived channels (GradMag, Laplacian, LocalStd) add texture/structure information and are especially helpful for detecting small or low-contrast defects.

About

PIXel Anomaly Locater.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors