CK Tile Comprehensive Tutorial Index

This tutorial series provides an in-depth understanding of the Composable Kernel (CK) Tile programming model, with enhanced Python implementations that include C++ code correspondence, visualizations, and step-by-step explanations.

Overview

The CK Tile programming model is a high-performance abstraction for GPU kernel development that provides:

Tile-based data distribution across GPU compute units
Efficient memory access patterns with coalescing and vectorization
Flexible tensor operations for complex algorithms
Hardware-optimized implementations for AMD GPUs

Tutorial Structure

1. Tile Distribution Tutorial (`tile_distribution_tutorial.py`)

Learn how tensor data is distributed across GPU processing elements.

Key Concepts:

Coordinate systems (X, Y, P, R)
Thread-to-data mapping
Warp and block-level distribution
Space-filling curves and swizzling

Highlights:

Interactive visualizations of distribution patterns
C++ code snippets showing actual CK implementations
Common patterns: GEMM, convolution, reduction
Performance optimization strategies

Example Usage:

from pytensor.tile_distribution_tutorial import run_interactive_tutorial
run_interactive_tutorial()

2. Tile Window Tutorial (`tile_window_tutorial.py`)

Master the tile window abstraction for efficient tensor access.

Key Concepts:

Window views into tensor memory
Load/store operations
Memory coalescing
Boundary handling

Highlights:

Step-by-step memory access visualization
Vectorized I/O demonstrations
Data layout impact on performance
Complete GEMM example with tile windows

Example Usage:

from pytensor.tile_window_tutorial import run_tile_window_tutorial
run_tile_window_tutorial()

3. Tensor Operations Tutorial (`tensor_operations_tutorial.py`)

Explore the complete set of tensor operations available in CK.

Key Concepts:

load_tile / store_tile
shuffle_tile (inter-thread communication)
update_tile (element-wise operations)
sweep_tile (reductions and scans)

Highlights:

Operation lifecycle visualization
Fusion strategies for performance
Real-world examples (GEMM, LayerNorm)
Performance optimization checklist

Example Usage:

from pytensor.tensor_operations_tutorial import run_tensor_operations_tutorial
run_tensor_operations_tutorial()

Learning Path

Beginner Path

Start with tile_distribution_tutorial.py - Section 1 (Core Concepts)
Move to tile_window_tutorial.py - Section 1 (Basic Operations)
Try simple examples in tensor_operations_tutorial.py - Section 1 (Load/Store)

Intermediate Path

Study GEMM distribution patterns in tile_distribution_tutorial.py
Understand memory coalescing in tile_window_tutorial.py
Learn operation fusion in tensor_operations_tutorial.py

Advanced Path

Master hierarchical tiling and swizzling patterns
Optimize memory access with vectorization
Implement custom kernels using the full operation set

C++ Integration

Each tutorial module includes extensive C++ code snippets that show:

Direct CK Library Usage

// From actual CK headers
template <typename TileDistribution>
struct tile_window_with_static_distribution {
    // Implementation details with explanations
};

Kernel Implementation Patterns

// Complete kernel examples
template <typename TileShape>
__global__ void gemm_kernel(...) {
    // Step-by-step implementation
}

Performance Optimizations

// Vectorized access, shuffle operations, etc.
using float4 = vector_type<float, 4>::type;

Key Features of Enhanced Tutorials

1. Progressive Complexity

Start with simple 2D examples
Build up to complex 3D tensor operations
Real-world kernel implementations

2. Interactive Visualizations

Matplotlib-based diagrams
Thread-to-memory mapping
Performance comparisons

3. Comprehensive Explanations

Detailed docstrings
Step-by-step execution traces
Common pitfalls and solutions

4. Performance Focus

Bandwidth utilization analysis
Optimization checklists
Hardware-specific considerations

Prerequisites

Software Requirements

Python 3.8+
NumPy
Matplotlib
(Optional) ROCm SDK for running actual C++ code

Knowledge Requirements

Basic understanding of GPU architecture
Familiarity with parallel programming concepts
C++ knowledge helpful but not required

Getting Started

Import the tutorials:

import sys
sys.path.append('path/to/pythonck')

from pytensor import tile_distribution_tutorial
from pytensor import tile_window_tutorial
from pytensor import tensor_operations_tutorial

Run interactive tutorials:

# Start with tile distribution
tile_distribution_tutorial.run_interactive_tutorial()

Explore specific concepts:

# Deep dive into memory coalescing
from pytensor.tile_window_tutorial import MemoryAccessPatterns
MemoryAccessPatterns.demonstrate_coalescing()

Advanced Topics

Custom Kernel Development

After completing the tutorials, you'll be able to:

Design efficient tile distributions for your algorithms
Implement high-performance kernels using CK abstractions
Optimize memory access patterns
Debug and profile GPU kernels

Integration with CK Library

The Python tutorials directly correspond to C++ CK usage:

# Python tutorial code
dist = TileDistributionTutorial(...)
window = TileWindowTutorial(...)

# Corresponds to C++ CK code
tile_distribution<...> dist{...};
tile_window<...> window{...};

Contributing

To extend these tutorials:

Add New Operations
- Implement in Python following the existing pattern
- Include C++ correspondence
- Add visualizations
Create Domain-Specific Examples
- Machine learning operations
- Scientific computing kernels
- Image processing algorithms
Improve Visualizations
- Add animation support
- 3D visualizations for complex patterns
- Performance profiling graphs

Resources

CK Documentation

Summary

These enhanced tutorials provide a comprehensive learning experience for the CK Tile programming model by:

Bridging Theory and Practice - Python implementations with C++ code
Visual Learning - Extensive visualizations and diagrams
Hands-on Experience - Interactive examples and exercises
Performance Focus - Optimization strategies and best practices

Start your journey with the tile distribution tutorial and progressively build your understanding of high-performance GPU kernel development with CK!

Note: These tutorials are designed to complement the official CK documentation and provide an accessible learning path for developers new to the CK programming model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CK Tile Comprehensive Tutorial Index

Overview

Tutorial Structure

1. Tile Distribution Tutorial (`tile_distribution_tutorial.py`)

2. Tile Window Tutorial (`tile_window_tutorial.py`)

3. Tensor Operations Tutorial (`tensor_operations_tutorial.py`)

Learning Path

Beginner Path

Intermediate Path

Advanced Path

C++ Integration

Key Features of Enhanced Tutorials

1. Progressive Complexity

2. Interactive Visualizations

3. Comprehensive Explanations

4. Performance Focus

Prerequisites

Software Requirements

Knowledge Requirements

Getting Started

Advanced Topics

Custom Kernel Development

Integration with CK Library

Contributing

Resources

CK Documentation

Related Tutorials

Summary

FilesExpand file tree

ck_tile_tutorial_index.md

Latest commit

History

ck_tile_tutorial_index.md

File metadata and controls

CK Tile Comprehensive Tutorial Index

Overview

Tutorial Structure

1. Tile Distribution Tutorial (tile_distribution_tutorial.py)

2. Tile Window Tutorial (tile_window_tutorial.py)

3. Tensor Operations Tutorial (tensor_operations_tutorial.py)

Learning Path

Beginner Path

Intermediate Path

Advanced Path

C++ Integration

Key Features of Enhanced Tutorials

1. Progressive Complexity

2. Interactive Visualizations

3. Comprehensive Explanations

4. Performance Focus

Prerequisites

Software Requirements

Knowledge Requirements

Getting Started

Advanced Topics

Custom Kernel Development

Integration with CK Library

Contributing

Resources

CK Documentation

Related Tutorials

Summary

1. Tile Distribution Tutorial (`tile_distribution_tutorial.py`)

2. Tile Window Tutorial (`tile_window_tutorial.py`)

3. Tensor Operations Tutorial (`tensor_operations_tutorial.py`)