Releases · microsoft/Foundry-Local

09 Apr 23:09

v1.0.0

3504910

v1.0.0 Foundry Local - General Availability Latest

Latest

We are excited to announce the General Availability of Foundry Local, a unified on-device AI runtime that brings generative AI directly into your applications. All inference runs locally: user data never leaves the device, responses are instant with zero network latency, and everything works offline. No per-token costs, no backend infrastructure.

SDKs

Foundry Local ships production SDKs for C#, JavaScript, Python, and Rust, each providing a consistent API surface for model management, chat completions, audio transcription, and tool calling.

	SDK	Package
	C#	`Microsoft.AI.Foundry.Local`
	JavaScript	`foundry-local-sdk`
🐍	Python	`foundry-local-sdk`
🦀	Rust	`foundry-local-sdk`

WinML Variants

Each SDK also ships a WinML variant that unlocks more GPU and NPU devices on Windows, available through the Windows ML execution provider catalog.

	SDK	Package
	C#	`Microsoft.AI.Foundry.Local.WinML`
	JavaScript	`foundry-local-sdk-winml`
🐍	Python	`foundry-local-sdk-winml`
🦀	Rust	`foundry-local-sdk` with `winml` feature flag

Platform Support

	OS	Architectures
	Windows	x64, ARM64
	macOS	ARM64
	Linux	x64

What You Can Build

Chat Completions

Full OpenAI-compatible chat completions API with multi-turn conversations, and configurable inference parameters (temperature, top-k, top-p, max tokens, frequency/presence penalty, random seed).

Audio Transcription

On-device speech-to-text. Transcribe audio files with language selection and temperature control.

Embedded Web Server

Start an OpenAI-compatible HTTP server from your application with a single call. Useful for multi-process architectures or bridging to tools that speak the OpenAI REST protocol.

Hardware Acceleration

Powered by ONNX Runtime, Foundry Local automatically detects available hardware and selects the best execution provider, with zero hardware detection code needed in your application.

Supported execution providers:

Execution Provider	Hardware	Platform
CPU	Universal fallback	All platforms
WebGPU	GPU acceleration	Windows x64, macOS arm64
CUDA	NVIDIA GPUs	Windows x64, Linux x64
OpenVINO	Intel GPUs and NPUs	Windows x64
QNN	Qualcomm NPUs	Windows ARM64
TensorRT RTX	NVIDIA GPUs	Windows x64
VitisAI	AMD NPUs	Windows x64

Execution providers can be discovered, downloaded, and registered at runtime through the SDK's discoverEps() and downloadAndRegisterEps() APIs, with per-provider progress callbacks.

Model Catalog & Management

Foundry Local includes a built-in model catalog with popular open-source models, optimized with state-of-the-art quantization and compression for on-device performance.

Model management features:

Browse & search the catalog programmatically
Multi-variant models - each alias maps to multiple variants optimized for different hardware (CPU, GPU, NPU)
Automatic variant selection - the SDK picks the best variant based on what's cached and what hardware is available, with manual override via selectVariant()
Download with progress tracking - real-time percentage callbacks
Load / unload lifecycle - explicit control over which models are in memory
Version management - query the catalog for the latest version of any model

Get Started

Documentation and Samples

Language	Cross-platform	Windows ML
JavaScript	`npm install foundry-local-sdk`	`npm install foundry-local-sdk-winml`
C#	`dotnet add package Microsoft.AI.Foundry.Local`	`dotnet add package Microsoft.AI.Foundry.Local.WinML`
Python	`pip install foundry-local-sdk`	`pip install foundry-local-sdk-winml`
Rust	`cargo add foundry-local-sdk`	`cargo add foundry-local-sdk --features winml`

Assets 2

22 Jan 01:26

natke

v0.8.119

449bd19

Foundry Local Release 0.8.119 Pre-release

Pre-release

Foundry Local 0.8.119 Release Notes 🚀

This release is an incremental build, targeting tool calling scenarios.

🐛 Bug fixes

#373 Function specs without parameters cause server error
#372 Tools not indexed in streaming mode

Assets 6

23 Dec 18:58

natke

v0.8.117

495b266

Foundry Local Release 0.8.117 Pre-release

Pre-release

Foundry Local 0.8.117 Release Notes 🚀

This release is an incremental build, targeting tool calling scenarios.

🐛 Bug fixes

#346 Tool calling doesn't return tool_calls results in streaming mode
#341 Exception when network is disconnected

📝 Known issues

#363 Tool calling fails on NVIDIA GPUs.

Assets 6

12 Dec 22:53

natke

v0.8.115

fdcce52

Foundry Local Release 0.8.115 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.115 🚀

This release is an incremental build targeting tool calling scenarios.

🐛 Bug fixes

#335 Guidance error when tool_choice=required
#336 Foundry Local enforcing "required" field of function parameters

📝 Known issues

#346 Tool calling doesn't return tool_calls results in streaming mode

Assets 6

26 Nov 20:00

natke

v0.8.113

111b6cd

Foundry Local Release 0.8.113 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.113 🚀

✨ New Features

Add support for tool calling. Models that support tool calling have the supportsToolCalling tag, which is also exposed via the SDKs.

🐛 Bug fixes

Fix crash on context length exhaustion. CLI now exits when context length is exhausted and the REST API returns an error if the request requires more tokens than max_length configuration allows.

📝 Known issues

This release only allows one tool call per request.

Assets 6

12 Nov 18:32

natke

v0.8.103

5366b90

Foundry Local Release 0.8.103 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.103 🚀

🔨 Filter out automatic speech recognition models from foundry model list

These models can be listed using the /foundry/list endpoint and run using the standalone SDK

⭐ Sign Up for Foundry Local SDK vNext Private Preview – Fill in form ⭐

Assets 6

07 Nov 01:18

natke

v0.8.101

87d50b7

Foundry Local Release 0.8.101 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.101 🚀

✨ New Features

Improve performance for multi-turn conversations on macOS, especially time to first token, with the addition of the continuous decoding feature. Only new tokens are sent to the model instead of the entire conversation. The previous inputs and responses are saved by the model in the KV-cache.

📝 Known issues

When the context length is exhausted (set by the max_length value), instead of showing a warning / error message, an exception is thrown

Assets 6

22 Oct 22:11

natke

v0.8.94

11a3fb1

Foundry Local Release 0.8.94 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.94 🚀

✨ New Features

Improve performance for multi-turn conversations, especially time to first token, with the addition of the continuous decoding feature. Only new tokens are sent to the model instead of the entire conversation. The previous inputs and responses are saved by the model in the KV-cache.

Website showing full model list with hardware variants: https://foundrylocal.ai/models

🐛 Bug fixes

Foundry Local now defaults to --default-log-level instead of Information if --log-level is not provided. Foundry Local also elevates the level with which some errors were being written with from Information to Error.
#265
#263
#71

📝 Known issues

This version is not supported on macos. Please use the previous release for macos. Support coming soon!
If model is not found in the catalog, instead of showing a warning / suggestion message and gracefully exiting, an exception
is thrown.
When the context length is exhausted (set by the max_length value), instead of showing a warning / error message, an exception is thrown

Assets 4

01 Oct 16:51

natke

v0.7.120

992c4ee

Foundry Local Release 0.7.120 Pre-release

Pre-release

Foundry Local Release Notes: v0.7.120 🚀

✨ New Features

Improvements to NPU accelerator (execution provider) download and registration user experience

🐛 Bug fixes

#257
#259
#263
#264

Assets 6

23 Sep 18:20

natke

v0.7.117

992c4ee

Foundry Local Release 0.7.117 Pre-release

Pre-release

Foundry Local Release Notes: v0.7.117 🚀

✨ New Features

Support for AMD and Intel NPUs, and more Qualcomm NPU models coming very soon
Pluggable execution providers, downloaded at runtime on Windows for AMD NPUs, Intel NPUs, NVIDIA GPUs, and Qualcomm NPUs
Filter models by device and provider

🐛 Bug fixes

Assets 9

Releases: microsoft/Foundry-Local

v1.0.0 Foundry Local - General Availability

SDKs

WinML Variants

Platform Support

What You Can Build

Chat Completions

Audio Transcription

Embedded Web Server

Hardware Acceleration

Model Catalog & Management

Get Started

Uh oh!

Foundry Local Release 0.8.119

Foundry Local 0.8.119 Release Notes 🚀

Uh oh!

Foundry Local Release 0.8.117

Foundry Local 0.8.117 Release Notes 🚀

Uh oh!

Foundry Local Release 0.8.115

Foundry Local Release Notes: v0.8.115 🚀

Uh oh!

Foundry Local Release 0.8.113

Uh oh!

Foundry Local Release 0.8.103

Foundry Local Release Notes: v0.8.103 🚀

Uh oh!

Foundry Local Release 0.8.101

Foundry Local Release Notes: v0.8.101 🚀

Uh oh!

Foundry Local Release 0.8.94

Foundry Local Release Notes: v0.8.94 🚀

Uh oh!

Foundry Local Release 0.7.120

Foundry Local Release Notes: v0.7.120 🚀

Uh oh!

Foundry Local Release 0.7.117

Foundry Local Release Notes: v0.7.117 🚀

✨ New Features

🐛 Bug fixes

Uh oh!