Skip to content

feat(skill): ship first-party timesfm-forecasting Agent Skill (agentskills.io)#369

Merged
siriuz42 merged 4 commits intogoogle-research:masterfrom
borealBytes:feat/timesfm-forecasting-skill
Mar 19, 2026
Merged

feat(skill): ship first-party timesfm-forecasting Agent Skill (agentskills.io)#369
siriuz42 merged 4 commits intogoogle-research:masterfrom
borealBytes:feat/timesfm-forecasting-skill

Conversation

@borealBytes
Copy link
Copy Markdown
Contributor

@borealBytes borealBytes commented Feb 22, 2026

What this adds

A timesfm-forecasting/ directory — a compliant Agent Skill that teaches AI agents how to use the TimesFM API correctly.

Agents that support the open Agent Skills standard (OpenCode, Cursor, Codex, and others) discover and install skills like this:

cp -r timesfm-forecasting/ ~/.cursor/skills/
cp -r timesfm-forecasting/ ~/.claude/skills/

Once installed, the agent reads SKILL.md at startup and gets accurate, production-ready knowledge of the TimesFM API — correct quantile indices, mandatory system check before model load, full ForecastConfig reference — before writing a single line of code.


Structure

timesfm-forecasting/          ← skill root (name matches SKILL.md `name` field per spec)
├── SKILL.md                  ← required: frontmatter + instructions
├── scripts/
│   ├── check_system.py       ← mandatory preflight: RAM / GPU / disk / Python / package check
│   └── forecast_csv.py       ← CLI: CSV in → forecast CSV out, any horizon, any columns
├── references/
│   ├── api_reference.md      ← ForecastConfig full docs, output shapes, model options
│   ├── data_preparation.md   ← input formats, NaN handling, CSV loading, covariate setup
│   └── system_requirements.md ← hardware tiers, memory estimation formulas
└── examples/
    ├── global-temperature/   ← basic forecast: NOAA CSV → PNG + animated GIF
    ├── anomaly-detection/    ← two-phase: detrend + Z-score + quantile PI
    └── covariates-forecasting/ ← forecast_with_covariates() XReg demo

AGENTS.md at repo root is a lightweight entry point for agents working directly in this repo (points to the skill and provides install instructions).


SKILL.md covers

  • When to use TimesFM vs. statsmodels / aeon / scikit-learn
  • Mandatory preflight system check (RAM, GPU, disk, Python version)
  • Hardware requirements for all model versions (1.0, 2.0, 2.5)
  • Full ForecastConfig parameter reference with "when to change" guidance
  • Three complete, runnable workflows (single series, batch, evaluation)
  • GPU acceleration + batch size tuning + memory-constrained chunking
  • Model version history with all HuggingFace checkpoint names
  • Quality checklist — 10 items to verify before declaring a task done
  • Common mistakes — 8 documented bugs with fixes
  • Regression verification commands for all three examples

No existing files modified

Everything lives under timesfm-forecasting/ and AGENTS.md. The existing src/, v1/, notebooks/, and README.md are untouched.


Testing

All three examples verified (see comments below for output images):

Example Acceptance criteria
examples/global-temperature/ point_forecast has 12 values; PNG shows context + forecast + PI bands; GIF animates 25 frames
examples/anomaly-detection/ Sep 2023 flagged CRITICAL (z ≥ 3.0); injected anomalies detected in forecast window
examples/covariates-forecasting/ 108-row CSV (3 stores × 36 weeks); distinct price arrays per store

Regression commands in SKILL.md under Validation & Verification.

Add a self-contained AI agent skill for TimesFM that teaches coding
agents (Claude Code, OpenCode, Cursor, Codex) how to use the TimesFM
API correctly — safe model loading, zero-shot forecasting, covariate
workflows, anomaly detection, and the most common pitfalls.

Files added:
- AGENTS.md             — auto-loaded skill document (root of repo)
- claude-skill/scripts/check_system.py    — mandatory preflight RAM/GPU/disk checker
- claude-skill/scripts/forecast_csv.py   — CLI wrapper for CSV forecasting
- claude-skill/references/               — ForecastConfig API ref, data prep, HW reqs
- claude-skill/examples/global-temperature/   — basic forecast + PNG/GIF pipeline
- claude-skill/examples/anomaly-detection/    — two-phase detrend+Z-score + quantile PI
- claude-skill/examples/covariates-forecasting/ — forecast_with_covariates() XReg demo
- .gitattributes        — Git LFS rules for PNG/GIF binary outputs

Contributed by Clayton Young / Superior Byte Works LLC (@borealBytes)
Apache 2.0 — same license as this repository
…ndard

Replace AGENTS.md / claude-skill/ with a proper agentskills.io-compliant
skill directory. Any AI agent that supports the open Agent Skills standard
(Claude Code, OpenCode, Cursor, Codex, etc.) can now install and use this
skill generically.

Changes:
- Remove AGENTS.md (was Claude-specific convention)
- Remove claude-skill/ directory (was Claude-specific naming)
- Add timesfm-forecasting/SKILL.md with compliant frontmatter:
    name: timesfm-forecasting
    description: ...
    license: Apache-2.0
    metadata: author, version
- Rename claude-skill/examples/ → timesfm-forecasting/examples/
- Rename claude-skill/scripts/  → timesfm-forecasting/scripts/
- Rename claude-skill/references/ → timesfm-forecasting/references/
- Update .gitattributes paths to match new directory

Skill installs via:
  cp -r timesfm-forecasting/ ~/.claude/skills/
  cp -r timesfm-forecasting/ ~/.cursor/skills/
  # or any agent that supports agentskills.io

Spec: https://agentskills.io/specification
Short pointer for agents working directly in this repo.
Points to timesfm-forecasting/SKILL.md and provides
install commands for the first-party Agent Skill.
@borealBytes
Copy link
Copy Markdown
Contributor Author

🌡️ Example 1 — Global Temperature Forecast

📁 examples/global-temperature/

The baseline example. Loads 564 rows of NOAA global temperature anomaly data (2022–2024), runs a zero-shot 12-month forecast with TimesFM 1.0, and outputs a static visualization plus a 25-frame animated GIF showing how the forecast evolves as more historical context is added.

📊 Forecast Visualization

Global Temperature Anomaly Forecast


🎬 Forecast Evolution Animation (25 frames)

Each frame adds one month of context (12 → 36 months). Watch the forecast tighten as the model sees more of the warming trend.

TimesFM Forecast Evolution


Key metrics

Value
Context window 36 months (2022-01 → 2024-12)
Forecast horizon 12 months
Final context value +1.24 °C (Dec 2024)
Point forecast range ~1.1–1.3 °C
PI bands shown 80% and 60%

Run it

cd timesfm-forecasting/examples/global-temperature
python run_forecast.py       # → output/forecast_output.json
python visualize_forecast.py # → output/forecast_visualization.png
python generate_animation_data.py && python generate_gif.py # → output/forecast_animation.gif

@borealBytes
Copy link
Copy Markdown
Contributor Author

🔍 Example 2 — Anomaly Detection (Two-Phase Method)

📁 examples/anomaly-detection/

TimesFM has no built-in anomaly detection, but its calibrated quantile intervals make it a natural fit. This example uses a two-phase approach combining classical detrending with TimesFM's prediction intervals.

📊 Anomaly Detection Output

Anomaly Detection — Two-Phase Method


How it works

Phase 1 — Context (historical 36 months, 2022–2024):

  • Linear detrend via np.polyfit → compute residuals
  • Z-score the residuals (σ ≈ 0.114 °C)
  • Flag: WARNING if |z| ≥ 2.0, CRITICAL if |z| ≥ 3.0
  • Sep 2023 correctly flagged CRITICAL (actual = +1.47 °C, z = +3.03) — the record-breaking heat spike

Phase 2 — Forecast (12 months):

  • 4 synthetic anomalies injected into the forecast window
  • Flagged using TimesFM 80%/90% quantile prediction intervals
  • Points outside the 90% CI → CRITICAL; outside 80% CI → WARNING

Results

Window Normal Warning Critical
Context (36 mo) 35 0 1 (Sep 2023)
Forecast (12 mo) 8 2 2

Run it

cd timesfm-forecasting/examples/anomaly-detection
python detect_anomalies.py
# → output/anomaly_detection.json
# → output/anomaly_detection.png

@borealBytes
Copy link
Copy Markdown
Contributor Author

📈 Example 3 — Covariates / XReg Forecasting

📁 examples/covariates-forecasting/

Demonstrates the forecast_with_covariates() API introduced in TimesFM 2.5. Uses synthetic 3-store weekly retail data with price as a dynamic numerical covariate, day-of-week as a dynamic categorical covariate, and store type as a static categorical covariate.

📊 Covariate Decomposition (2×2 layout, shared x-axis)

Covariates Forecast — 3 Stores × 36 Weeks


Dataset

Store Type Base Price Avg Weekly Sales
store_A premium $12.00 ~1,060 units
store_B standard $10.00 ~815 units
store_C discount $7.50 ~550 units

Output CSV: 108 rows (3 stores × 36 weeks = 24 context + 12 horizon)

Covariate types used

Type Field Values
dynamic_numerical price per-store weekly prices (known future)
dynamic_categorical day_of_week 0–6 (Mon–Sun)
static_categorical store_type premium / standard / discount

⚠️ Requires TimesFM 2.5 + pip install timesfm[xreg]forecast_with_covariates() does not exist in TimesFM 1.0 or 2.0.

Run it

pip install timesfm[xreg]
cd timesfm-forecasting/examples/covariates-forecasting
python demo_covariates.py
# → output/sales_with_covariates.csv  (108 rows)
# → output/covariates_data.png
# → output/covariates_metadata.json

@borealBytes
Copy link
Copy Markdown
Contributor Author

borealBytes commented Feb 22, 2026

Why this PR exists — and why it matters

Saw Nic Borensztein's post a couple weeks back and it crystallized something I'd been thinking about:

"The CLI gives agents access. The skill gives them competence."

That's exactly right. Without a skill, agents act sloppily — wrong API calls, wrong quantile indices, OOM crashes on first model load. The documentation exists, but agents don't read it the way humans do. A SKILL.md is the bridge.

I'd just submitted a similar skill to K-Dense AI's scientific skills repo — currently the largest collection of research-focused agent skills I can find, covering 140+ scientific Python packages and databases. I was already building skills there (scientific writing standards, markdown/mermaid documentation pipelines), so TimesFM was a natural next addition. But a first-party skill belongs here, not just in a third-party library.

The man page analogy is real. Early Linux shipped full documentation with every tool — man grep gave you the exact docs for your exact version, offline, authoritative, no Googling. Fast internet killed that discipline. SKILL.md is how we get it back: documentation that ships with the code, versioned together, always in sync.

There's also a security angle nobody talks about: if you don't ship your own skill, someone else will. Third-party skills carry no license provenance, no safety guarantees, no official boundaries for what agents should and shouldn't do with your API. First-party skills let you own that contract.

This is me nudging things forward on free trials and leftover credits. Happy to do more — better skills, skill scaling patterns, security/deployment ideas — but I'm running on fumes. Open to a conversation, AI credits, or even just a PR merge or comment from the team. Either way, I hope this is useful. Take it and make it better.

— Clayton Young / @borealBytes / Clayton@SuperiorByteWorks.com

@borealBytes
Copy link
Copy Markdown
Contributor Author

borealBytes commented Feb 22, 2026

@rajatsen91 @siriuz42 — hey, wanted to flag this directly rather than just let it sit in the queue.

I only laid eyes on TimesFM yesterday. I pushed through it until the API made sense to me, but I'm not deeply familiar with the model internals and I haven't done this exact forecasting work in a while. For anything going into an official repo, I'd really appreciate a human sanity check on the math before this gets any closer to merge.

Specific areas I'd love eyes on:

  • Anomaly detection two-phase method — Z-score on linear-detrended residuals for the context window, then quantile PI thresholds for the forecast period (CRITICAL = outside 90% PI, WARNING = outside 80% PI). Wanted to make sure the threshold logic is sound and the phase boundary is handled correctly.
  • Quantile index constantsIDX_Q10=1, IDX_Q20=2, IDX_Q80=8, IDX_Q90=9. Index 0 is the mean, not q0 — at least that's my read of the output. This is the single most common mistake I've seen in TimesFM usage and I want to make sure I didn't bake in an off-by-one.
  • Covariates example — the forecast_with_covariates() API pattern, covariate decomposition, and whether the context/horizon split on dynamic covariates is correct. TimesFM 2.5 specific, so extra scrutiny welcome.

I'm doing a lot of solo work right now and a real human review from someone who knows this codebase means a lot. Please don't merge until someone's had a proper look. Happy to walk through the logic or answer any questions.


Also — if anyone finds this useful and wants to see more, AI credits are genuinely the best way to make that happen. I'm currently grinding on free-tier rate limits and whatever I can piece together — OpenRouter free models, NVIDIA NIM, and OpenCode Zen are my current top three, supplemented by whatever free trials I can find. More credits = better code, more examples, faster turnaround.

Any credits from any provider are welcome — I'll put them to work.

And if anyone's interested in collaborating or has opportunities — paid or otherwise — feel free to reach out: Clayton@SuperiorByteWorks.com | LinkedIn. Happy to talk.

@rajatsen91 rajatsen91 self-assigned this Feb 25, 2026
- **Use case**: Light exploration, single-series forecasting, prototyping
- **Model**: TimesFM 2.5 (200M) only
- **Batch size**: `per_core_batch_size=4`
- **Context**: Limit `max_context=512`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we arrive at this limit? Is it a memory constraint or time constraint ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

@rajatsen91 rajatsen91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. This seems very interesting. Left some minor comments.

- **Use case**: Batch forecasting (dozens of series), evaluation, production prototypes
- **Model**: TimesFM 2.5 (200M)
- **Batch size**: `per_core_batch_size=32` (CPU) or `64` (GPU)
- **Context**: `max_context=1024`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name: timesfm-forecasting
description: >
Zero-shot time series forecasting with Google's TimesFM foundation model. Use this
skill when forecasting ANY univariate time series — sales, sensor readings, stock prices,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also include description of the xreg mode ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Use this skill when:

- Forecasting **any univariate time series** (sales, demand, sensor, vitals, price, weather)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto about xreg mode.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatsen91 rajatsen91 removed their assignment Feb 25, 2026
@rajatsen91 rajatsen91 requested a review from siriuz42 February 25, 2026 17:41
…preflight

- Add context limit rationale to system_requirements.md with memory formula
- Update SKILL.md to include XReg/covariates in description and usage sections
- Add dataset-aware memory estimation to check_system.py with new CLI args
- Document memory estimation in api_reference.md with Mermaid diagram
- Add dataset preflight section to SKILL.md with examples

Resolves review comments about:
- How context limits (512/1024) were determined
- Including XReg mode description in skill documentation

Bonus enhancement: Dataset preflight checking prevents OOM before loading data.
@borealBytes
Copy link
Copy Markdown
Contributor Author

Thanks for the question! The max_context values (512 for Tier 1, 1024 for Tier 2) are conservative recommendations based on memory-performance tradeoffs, not hard limits.

Rationale:

  1. TimesFM 2.5 supports up to 16,384 context — these are recommendations, not maximums
  2. Memory-driven: Based on the formula shown below
  3. Performance-driven: Smaller contexts = faster inference, less memory pressure
  4. Use-case aligned:
    • 512 = ~1-2 years of daily data (prototyping)
    • 1024 = ~2-3 years of daily data (standard production)

Memory Formula (now in api_reference.md):

block-beta
    columns 3
    ram["Total RAM Required"] model["Model Weights<br/>~0.8 GB"] overhead["Runtime Overhead<br/>~0.5 GB"] buffers["I/O Buffers<br/>~0.2 MB per 1000 series<br/>per 1000 context"]
    
    ram --> model
    ram --> overhead
    ram --> buffers
Loading
  • 512 context = ~100 MB per 1000 series
  • 1024 context = ~200 MB per 1000 series

Changes made:

  • Added "How Context Limits Are Determined" section to system_requirements.md
  • Added memory formula and tradeoff table
  • Clarified that users can use larger contexts if hardware supports it

The limits are designed to provide a good out-of-box experience on the specified hardware while leaving headroom for actual data processing.

@borealBytes
Copy link
Copy Markdown
Contributor Author

Absolutely! Updated the SKILL.md description to include XReg/covariates as a core capability.

Changes made:

  1. Updated skill description (frontmatter) to mention:

    • "Supports both basic forecasting and advanced covariate forecasting (XReg)"
    • "Automatically validates dataset fit before processing"
  2. Updated "When to Use This Skill" section to include:

    • "You need covariate forecasting with exogenous variables → use forecast_with_covariates()"
  3. Updated "Do not use" section to include:

    • "You cannot install optional dependencies → XReg requires scikit-learn and JAX"
  4. Existing documentation already covers:

    • Full XReg API documentation in references/api_reference.md
    • Complete working example in examples/covariates-forecasting/
    • Installation note: "Requires pip install timesfm[xreg]"

The skill now properly presents XReg as a significant capability alongside basic forecasting, while making clear it's an optional advanced feature.

@rajatsen91 rajatsen91 requested review from abhidas and removed request for siriuz42 February 26, 2026 05:35
@borealBytes
Copy link
Copy Markdown
Contributor Author

@abhidas @rajatsen91 Happy to answer any other questions from my end.

@rajatsen91 rajatsen91 requested review from siriuz42 and removed request for abhidas March 2, 2026 22:02
@rajatsen91
Copy link
Copy Markdown
Collaborator

@siriuz42 can you take a look. Overall looks good to me.

Copy link
Copy Markdown
Collaborator

@siriuz42 siriuz42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @borealBytes - It overall looks good! Thanks for the contribution.

My biggest suggestion is to only instruct the agent to use 2.5 - there is little value of using older versions due to quality reasons.

See the detailed comments.

- You need time series classification or clustering → use `aeon`
- You need multivariate vector autoregression or Granger causality → use `statsmodels`
- Your data is tabular (not temporal) → use `scikit-learn`
- You cannot install optional dependencies → XReg requires scikit-learn and JAX
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

You need to run XReg but cannot install optional dependencies → XReg requires scikit-learn and JAX


hparams = timesfm.TimesFmHparams(horizon_len=HORIZON)
checkpoint = timesfm.TimesFmCheckpoint(
huggingface_repo_id="google/timesfm-1.0-200m-pytorch"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong version 1.0 --> 2.5, in which case the model initialization code also needs minor revision.

print(f"\n🤖 Loading TimesFM 1.0 (200M) PyTorch (horizon={MAX_HORIZON})...")
hparams = timesfm.TimesFmHparams(horizon_len=MAX_HORIZON)
checkpoint = timesfm.TimesFmCheckpoint(
huggingface_repo_id="google/timesfm-1.0-200m-pytorch"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

```python
hparams = timesfm.TimesFmHparams(horizon_len=12)
checkpoint = timesfm.TimesFmCheckpoint(
huggingface_repo_id="google/timesfm-1.0-200m-pytorch"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


hparams = timesfm.TimesFmHparams(horizon_len=12)
checkpoint = timesfm.TimesFmCheckpoint(
huggingface_repo_id="google/timesfm-1.0-200m-pytorch"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

| Version | Params | Context | Status | HuggingFace checkpoint |
| ------- | ------ | ------- | ------ | ---------------------- |
| **2.5** | 200M | 16,384 | **Latest** | `google/timesfm-2.5-200m-pytorch` |
| 2.0 | 500M | 2,048 | Archived | `google/timesfm-2.0-500m-pytorch` |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we emphasize the latest version (2.5) due to quality reasons? I am fine with not referencing 1.0 or 2.0 at all.


- [ ] **Output shape** — `point_fc` is `(n_series, horizon)`, `quant_fc` is `(n_series, horizon, 10)`
- [ ] **Quantile indices** — index 0 = mean, 1 = q10 ... 9 = q90. NOT 0 = q0.
- [ ] **Frequency flag** — TimesFM 1.0/2.0: pass `freq=[0]` for monthly. TimesFM 2.5: omit.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

upper_80 = q[:, :, 9] # 90th percentile
median = q[:, :, 5]
```

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe mention the agent can further extrapolate these quantiles using a method of their choice, e.g., half sided Gaussian approximation?

@siriuz42
Copy link
Copy Markdown
Collaborator

Let's merge the PR for now. Let's address the deprecation of v1 in a later PR.

@siriuz42 siriuz42 merged commit 2c1052b into google-research:master Mar 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants