This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This repository contains three distinct machine learning research projects focused on different domains:
- ST-LLM+ - Graph Enhanced Spatio-Temporal Large Language Models for Traffic Prediction
- ST-LLM - Original Spatial-Temporal Large Language Model for Traffic Prediction
- SeisMoLLM - Seismic Motion Large Language Model for earthquake/seismic data analysis
Each project has its own environment requirements:
- Python 3.8.19, PyTorch 2.4.1, CUDA 11.7, torchvision 0.19.1
- Setup:
conda env create -f env_ubuntu.yaml(in respective project directories) - Dependencies: transformers, torch-geometric, pandas, numpy, scipy
- Python environment (specific version not explicitly documented)
- PyTorch-based with seismic data processing capabilities
CUDA_VISIBLE_DEVICES=0
nohup python train_plus.py --data taxi_pick > your_log_name.log &CUDA_VISIBLE_DEVICES=0
nohup python train.py --data taxi_pick > your_log_name.log &# Distributed training example
torchrun --nnodes 1 --nproc_per_node 4 --master_port 10000 main.py \
--seed 0 --mode "train" --model-name "SeisGPT_baz" \
--data "/datasets/DiTing330km" --dataset-name "diting_light" \
--epochs 200 --batch-size 128 --base-lr 0.0005- Partially Frozen Graph Attention (PFGA) module for capturing localized dependencies
- LoRA-augmented training strategy for efficient fine-tuning
- Proximity-based adjacency matrix integration for traffic network modeling
- Core model:
model_ST_LLM_plus.py
- Spatial-temporal embedding for learning location and temporal patterns
- Fusion convolution for unified spatial-temporal representation
- Partially frozen attention strategy for LLM adaptation
- Core model:
model_ST_LLM.py
- Multi-task seismic analysis including phase picking, magnitude estimation, azimuth prediction
- Configurable model architecture via
config.pywith regex-based model matching - Multiple loss functions (BCE, Focal, Huber, Combination losses)
- Distributed training support with torchrun
- ST-LLM/ST-LLM+: Traffic datasets (taxi_pick, taxi_drop, bike_pick, bike_drop) with adjacency matrices
- SeisMoLLM: Seismic datasets (DiTing, STEAD) with various input channels (z, n, e components)
The config.py file contains comprehensive model configurations using regex patterns for model names:
- Input/output channel definitions
- Loss function assignments
- Evaluation metrics specification
- Transform functions for data preprocessing
- Learning rates: 1e-3 to 1e-4 range
- Batch sizes: 64-128
- Early stopping: patience around 30-100 epochs
- Weight decay: 1e-4
- CUDA memory optimization: Various split size configurations
All projects generate training logs in ./logs/ directories with timestamped filenames and model-specific naming conventions.