Skip to content

Latest commit

 

History

History
93 lines (69 loc) · 3.25 KB

File metadata and controls

93 lines (69 loc) · 3.25 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Repository Overview

This repository contains three distinct machine learning research projects focused on different domains:

  1. ST-LLM+ - Graph Enhanced Spatio-Temporal Large Language Models for Traffic Prediction
  2. ST-LLM - Original Spatial-Temporal Large Language Model for Traffic Prediction
  3. SeisMoLLM - Seismic Motion Large Language Model for earthquake/seismic data analysis

Environment Setup

Each project has its own environment requirements:

ST-LLM and ST-LLM+

  • Python 3.8.19, PyTorch 2.4.1, CUDA 11.7, torchvision 0.19.1
  • Setup: conda env create -f env_ubuntu.yaml (in respective project directories)
  • Dependencies: transformers, torch-geometric, pandas, numpy, scipy

SeisMoLLM

  • Python environment (specific version not explicitly documented)
  • PyTorch-based with seismic data processing capabilities

Training Commands

ST-LLM+

CUDA_VISIBLE_DEVICES=0
nohup python train_plus.py --data taxi_pick > your_log_name.log &

ST-LLM

CUDA_VISIBLE_DEVICES=0
nohup python train.py --data taxi_pick > your_log_name.log &

SeisMoLLM

# Distributed training example
torchrun --nnodes 1 --nproc_per_node 4 --master_port 10000 main.py \
    --seed 0 --mode "train" --model-name "SeisGPT_baz" \
    --data "/datasets/DiTing330km" --dataset-name "diting_light" \
    --epochs 200 --batch-size 128 --base-lr 0.0005

Key Architecture Components

ST-LLM+

  • Partially Frozen Graph Attention (PFGA) module for capturing localized dependencies
  • LoRA-augmented training strategy for efficient fine-tuning
  • Proximity-based adjacency matrix integration for traffic network modeling
  • Core model: model_ST_LLM_plus.py

ST-LLM

  • Spatial-temporal embedding for learning location and temporal patterns
  • Fusion convolution for unified spatial-temporal representation
  • Partially frozen attention strategy for LLM adaptation
  • Core model: model_ST_LLM.py

SeisMoLLM

  • Multi-task seismic analysis including phase picking, magnitude estimation, azimuth prediction
  • Configurable model architecture via config.py with regex-based model matching
  • Multiple loss functions (BCE, Focal, Huber, Combination losses)
  • Distributed training support with torchrun

Data Structure

  • ST-LLM/ST-LLM+: Traffic datasets (taxi_pick, taxi_drop, bike_pick, bike_drop) with adjacency matrices
  • SeisMoLLM: Seismic datasets (DiTing, STEAD) with various input channels (z, n, e components)

Model Configuration (SeisMoLLM)

The config.py file contains comprehensive model configurations using regex patterns for model names:

  • Input/output channel definitions
  • Loss function assignments
  • Evaluation metrics specification
  • Transform functions for data preprocessing

Training Parameters

Common patterns across projects:

  • Learning rates: 1e-3 to 1e-4 range
  • Batch sizes: 64-128
  • Early stopping: patience around 30-100 epochs
  • Weight decay: 1e-4
  • CUDA memory optimization: Various split size configurations

Log Management

All projects generate training logs in ./logs/ directories with timestamped filenames and model-specific naming conventions.