Skip to content

V4 cache handler#1511

Open
nico-martin wants to merge 13 commits intomainfrom
v4-cache-handler
Open

V4 cache handler#1511
nico-martin wants to merge 13 commits intomainfrom
v4-cache-handler

Conversation

@nico-martin
Copy link
Collaborator

Improved Download Progress Tracking

Problem

Transformers.js couldn't reliably track total download progress because:

  • File lists weren't known before downloads started
  • File sizes were inconsistent (compressed vs uncompressed)
  • No cache awareness before initiating downloads

Solution

New Exported Functions

  • get_files(): Determines required files before downloading
  • get_model_files() / get_tokenizer_files() / get_processor_files(): Helper functions to identify files for each component
  • get_file_metadata(): Fetches file metadata using Range requests without downloading full content
    • Returns fromCache boolean to identify cached files
    • Ensures consistent uncompressed file sizes
  • is_cached(): Checks if all files from a model are already in cache

Enhanced Progress Tracking

  • readResponse() with expectedSize: Falls back to metadata when content-length header is missing
  • total_progress callback: Provides aggregate progress across all files

Review

One thing I am not super confident is the get_model_files function. I tried to test it with different model architectures, but maybe I missed some that load files that are not in that function. @xenova, could you smoke-test some models and write mie the models that fail?

Easiest way to do that is:

import {
  get_files,
  pipeline,
} from "@huggingface/transformers";

const expectedFiles = await get_files(
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
  }
);
const loadedFiles = new Set();
const pipe = await pipeline(
  "text-generation",
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
    progress_callback: (e) => {
      if (e.file) loadedFiles.add(e.file);
    },
  }
);

console.log(
  "SAME FILES:",
  expectedFiles.sort().join(",") === Array.from(loadedFiles).sort().join(",")
);

@nico-martin nico-martin requested a review from xenova February 3, 2026 15:24
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting PR! 🙌 Just a quick review from scanning the PR briefly.

Comment on lines 55 to 62
// Hub utilities for cache and file management
export { get_file_metadata } from './utils/hub.js';
export { get_files } from './utils/hub/get_files.js';
export { get_model_files } from './utils/hub/get_model_files.js';
export { get_tokenizer_files } from './utils/hub/get_tokenizer_files.js';
export { get_processor_files } from './utils/hub/get_processor_files.js';
export { is_cached } from './utils/hub/is_cached.js';

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we find a way to export a single Object/static class, which encapsulates all these methods?

e.g.,

import { ClassName } from '@huggingface/transformers'; // no idea what to call it yet haha

const x = await ClassName.get_files(...);

I feel like this could be more future proof.

No idea what to call this wrapper class, but something along the lines of CacheRegistry, FileRegistry, HubRegistry could work? Maybe we can find a similar abstraction in the transformers/huggingface_hub library?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think the perfect solution would be to have a different pattern for the pipeline:

const textGenerator = new Pipeline({
  "text-generation",
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
  }
});

const isCached = await textGenerator.isCached();
const files = await textGenerator.getFiles();

await textGenerator.load({ progress_callback: console.log })

const output = await textGenerator.generate([
  {
    role: "system",
    content: "You are a helpful assistant",
  },
  {
    role: "user",
    content: "How are you?",
  },
], {
  max_new_tokens: 1024,
})

Then those methods would be on the pipe. But I definitely see your point. The whole feature should be encapsuled in one export. I'll think of a cleaner solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 64 to 81
if (!foundInMapping) {
if (config.is_encoder_decoder) {
const modelName = config.model_type;
if (['whisper', 'vision-encoder-decoder'].includes(modelName)) {
modelType = MODEL_TYPES.Vision2Seq;
} else if (modelName === 'musicgen') {
modelType = MODEL_TYPES.Musicgen;
} else {
modelType = MODEL_TYPES.Seq2Seq;
}
} else {
if (architectures.some((arch) => arch.includes('CausalLM') || arch.includes('LMHead'))) {
modelType = MODEL_TYPES.DecoderOnly;
} else {
modelType = MODEL_TYPES.EncoderOnly;
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check which models are not correctly set? Ideally, we shouldn't need manual checking here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can check! I added logging to detect models not in MODEL_TYPE_MAPPING:

console.warn(                                                                   
  `[get_model_files] Architecture(s) not found in MODEL_TYPE_MAPPING: [${archList}] ` + 
  `for model type '${config.model_type}'. Using heuristic detection. ` +
  `Consider adding to packages/transformers/src/models/registry.js`
);                                                                              

I also improved the lookup to check model_type before falling back to heuristics. This fixed models like modnet (no architectures array) and briaai/RMBG-1.4 which were triggering the fallback unnecessarily.

Comment on lines +131 to +133
// Note: generation_config.json is only loaded for generation models (e.g., T5ForConditionalGeneration)
// not for base models (e.g., T5Model). Since we can't determine the specific class here,
// we include it as it's loaded for most use cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if generation_config.json isn't found, do we throw an error, or gracefully ignore it? 👀

Luckily, as you said, users very rarely load the non-generation variants of these models.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I mean I wouldn't throw an error. If there is no generation_config.json I don't see a problem. It would expect a file, but the expected fize will be 0. So the total would still be correct. On the progress callback it will then load all the other files up to 100%, then its done. I think its fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, its better to add a file if we are not sure. The only downside could be a 404 on the metadata check. But I dont think thats a problem.

@xenova xenova changed the base branch from v4 to main February 13, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants