Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,10 @@ You can refine your search by selecting the task you're interested in (e.g., [te
1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://huggingface.co/papers/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://huggingface.co/papers/2209.03592) by Peng Wang, Cheng Da, and Cong Yao.
1. **[Mimi](https://huggingface.co/docs/transformers/model_doc/mimi)** (from Kyutai) released with the paper [Moshi: a speech-text foundation model for real-time dialogue](https://huggingface.co/papers/2410.00037) by Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour.
1. **[Ministral](https://huggingface.co/docs/transformers/model_doc/ministral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team.
1. **[Ministral3](https://huggingface.co/docs/transformers/model_doc/ministral3)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team.
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[Mistral3](https://huggingface.co/docs/transformers/model_doc/mistral3)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team.
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://huggingface.co/papers/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://huggingface.co/papers/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **MobileCLIP** (from Apple) released with the paper [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://huggingface.co/papers/2311.17049) by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.
Expand Down
3 changes: 3 additions & 0 deletions docs/snippets/6_supported-models.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,10 @@
1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://huggingface.co/papers/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://huggingface.co/papers/2209.03592) by Peng Wang, Cheng Da, and Cong Yao.
1. **[Mimi](https://huggingface.co/docs/transformers/model_doc/mimi)** (from Kyutai) released with the paper [Moshi: a speech-text foundation model for real-time dialogue](https://huggingface.co/papers/2410.00037) by Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour.
1. **[Ministral](https://huggingface.co/docs/transformers/model_doc/ministral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team.
1. **[Ministral3](https://huggingface.co/docs/transformers/model_doc/ministral3)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team.
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[Mistral3](https://huggingface.co/docs/transformers/model_doc/mistral3)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team.
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://huggingface.co/papers/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://huggingface.co/papers/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **MobileCLIP** (from Apple) released with the paper [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://huggingface.co/papers/2311.17049) by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.
Expand Down
8 changes: 4 additions & 4 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
},
"homepage": "https://github.com/huggingface/transformers.js#readme",
"dependencies": {
"@huggingface/jinja": "^0.5.1",
"@huggingface/jinja": "^0.5.3",
"onnxruntime-node": "1.21.0",
"onnxruntime-web": "1.22.0-dev.20250409-89f8206ba4",
"sharp": "^0.34.1"
Expand Down
3 changes: 3 additions & 0 deletions src/configs.js
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ function getNormalizedConfig(config) {
case 'voxtral':
case 'smolvlm':
case 'gemma3n':
case 'mistral3':
// @ts-expect-error TS2339
init_normalized_config = getNormalizedConfig(config.text_config);
break;
Expand Down Expand Up @@ -145,6 +146,8 @@ function getNormalizedConfig(config) {
case 'glm':
case 'helium':
case 'ernie4_5':
case 'ministral':
case 'ministral3':
mapping['num_heads'] = 'num_key_value_heads';
mapping['num_layers'] = 'num_hidden_layers';
mapping['dim_kv'] = 'head_dim';
Expand Down
21 changes: 21 additions & 0 deletions src/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -3866,6 +3866,8 @@ export class LlavaQwen2ForCausalLM extends LlavaPreTrainedModel {
}
}

export class Mistral3ForConditionalGeneration extends LlavaQwen2ForCausalLM { }

export class Gemma3nPreTrainedModel extends PreTrainedModel {
forward_params = [
'input_ids',
Expand Down Expand Up @@ -6948,6 +6950,20 @@ export class MistralModel extends MistralPreTrainedModel { }
export class MistralForCausalLM extends MistralPreTrainedModel { }
//////////////////////////////////////////////////

//////////////////////////////////////////////////
// Ministral models
export class MinistralPreTrainedModel extends PreTrainedModel { }
export class MinistralModel extends MinistralPreTrainedModel { }
export class MinistralForCausalLM extends MinistralPreTrainedModel { }
//////////////////////////////////////////////////

//////////////////////////////////////////////////
// Ministral3 models
export class Ministral3PreTrainedModel extends PreTrainedModel { }
export class Ministral3Model extends Ministral3PreTrainedModel { }
export class Ministral3ForCausalLM extends Ministral3PreTrainedModel { }
//////////////////////////////////////////////////

//////////////////////////////////////////////////
// ERNIE-4.5 models
export class Ernie4_5PreTrainedModel extends PreTrainedModel { }
Expand Down Expand Up @@ -8041,6 +8057,8 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
['mpt', ['MptModel', MptModel]],
['opt', ['OPTModel', OPTModel]],
['mistral', ['MistralModel', MistralModel]],
['ministral', ['MinistralModel', MinistralModel]],
['ministral3', ['Ministral3Model', Ministral3Model]],
['ernie4_5', ['Ernie4_5Model', Ernie4_5Model]],
['starcoder2', ['Starcoder2Model', Starcoder2Model]],
['falcon', ['FalconModel', FalconModel]],
Expand Down Expand Up @@ -8155,6 +8173,8 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
['opt', ['OPTForCausalLM', OPTForCausalLM]],
['mbart', ['MBartForCausalLM', MBartForCausalLM]],
['mistral', ['MistralForCausalLM', MistralForCausalLM]],
['ministral', ['MinistralForCausalLM', MinistralForCausalLM]],
['ministral3', ['Ministral3ForCausalLM', Ministral3ForCausalLM]],
['ernie4_5', ['Ernie4_5ForCausalLM', Ernie4_5ForCausalLM]],
['starcoder2', ['Starcoder2ForCausalLM', Starcoder2ForCausalLM]],
['falcon', ['FalconForCausalLM', FalconForCausalLM]],
Expand Down Expand Up @@ -8228,6 +8248,7 @@ const MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES = new Map([
['paligemma', ['PaliGemmaForConditionalGeneration', PaliGemmaForConditionalGeneration]],
['llava_qwen2', ['LlavaQwen2ForCausalLM', LlavaQwen2ForCausalLM]],
['gemma3n', ['Gemma3nForConditionalGeneration', Gemma3nForConditionalGeneration]],
['mistral3', ['Mistral3ForConditionalGeneration', Mistral3ForConditionalGeneration]],
]);

const MODEL_FOR_AUDIO_TEXT_TO_TEXT_MAPPING_NAMES = new Map([
Expand Down
1 change: 1 addition & 0 deletions src/models/image_processors.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ export * from './nougat/image_processing_nougat.js'
export * from './owlv2/image_processing_owlv2.js'
export * from './owlvit/image_processing_owlvit.js'
export * from './phi3_v/image_processing_phi3_v.js'
export * from './pixtral/image_processing_pixtral.js'
export * from './pvt/image_processing_pvt.js'
export * from './qwen2_vl/image_processing_qwen2_vl.js'
export * from './rt_detr/image_processing_rt_detr.js'
Expand Down
38 changes: 38 additions & 0 deletions src/models/pixtral/image_processing_pixtral.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import {
ImageProcessor,
} from "../../base/image_processors_utils.js";

export class PixtralImageProcessor extends ImageProcessor {

/** @type {ImageProcessor['get_resize_output_image_size']} */
get_resize_output_image_size(image, size) {
const { longest_edge } = size;
if (longest_edge === undefined) {
throw new Error("size must contain 'longest_edge'");
}

const [srcWidth, srcHeight] = image.size;

const ratio = Math.max(srcWidth, srcHeight) / longest_edge;

let newWidth = srcWidth;
let newHeight = srcHeight;
if (ratio > 1) {
newWidth = Math.floor(srcWidth / ratio);
newHeight = Math.floor(srcHeight / ratio);
}

// @ts-expect-error TS2339
const { patch_size, spatial_merge_size } = this.config;
if (!spatial_merge_size) {
throw new Error("config must contain 'spatial_merge_size'");
}
const real_patch_size = patch_size * spatial_merge_size;

// Calculate number of tokens
const num_width_tokens = Math.floor((newWidth - 1) / real_patch_size) + 1;
const num_height_tokens = Math.floor((newHeight - 1) / real_patch_size) + 1;

return [num_width_tokens * real_patch_size, num_height_tokens * real_patch_size];
}
}
48 changes: 48 additions & 0 deletions src/models/pixtral/processing_pixtral.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@

import { Processor } from "../../base/processing_utils.js";
import { AutoImageProcessor } from "../auto/image_processing_auto.js";
import { AutoTokenizer } from "../../tokenizers.js";

export class PixtralProcessor extends Processor {
static tokenizer_class = AutoTokenizer
static image_processor_class = AutoImageProcessor
static uses_processor_config = true;

/**
* @typedef {import('../../utils/image.js').RawImage} RawImage
*/

// `images` is required, `text` is optional
async _call(/** @type {RawImage|RawImage[]} */ images, text = null, kwargs = {}) {

const image_inputs = await this.image_processor(images, kwargs);

if (text) {
const [height, width] = image_inputs.pixel_values.dims.slice(-2);

const { image_token, image_break_token, image_end_token, patch_size, spatial_merge_size } = this.config;
const real_patch_size = patch_size * spatial_merge_size;
const num_height_tokens = Math.floor(height / real_patch_size);
const num_width_tokens = Math.floor(width / real_patch_size);

text = structuredClone(text); // Avoid modifying the original text input
if (!Array.isArray(text)) {
text = [text];
}
for (let i = 0; i < text.length; ++i) {
const width_tokens = image_token.repeat(num_width_tokens);
const row = width_tokens + image_break_token;
const finalRow = width_tokens + image_end_token;
const full = row.repeat(num_height_tokens - 1) + finalRow;
text[i] = text[i].replace(image_token, full);
}
}

const text_inputs = text ? this.tokenizer(text, kwargs) : {};

return {
...image_inputs,
...text_inputs,
}
}
}
3 changes: 2 additions & 1 deletion src/models/processors.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ export * from './llava/processing_llava.js';
export * from './mgp_str/processing_mgp_str.js';
export * from './moonshine/processing_moonshine.js';
export * from './owlvit/processing_owlvit.js';
export * from './phi3_v/processing_phi3_v.js';
export * from './paligemma/processing_paligemma.js';
export * from './phi3_v/processing_phi3_v.js';
export * from './pixtral/processing_pixtral.js';
export * from './pyannote/processing_pyannote.js';
export * from './qwen2_vl/processing_qwen2_vl.js';
export * from './sam/processing_sam.js';
Expand Down