SFT sample packing #3566

djsaunde · 2025-11-07T19:03:09Z

This PR patches the SFTTrainer class' constructor to auto-enable sample packing when applicable (i.e., when the user is not training a multi-modal model or passing in a custom data collator). Sample packing is a good default in SFT since we reduce the amount of zero-padding we see and increase the training token/s throughput as a result.

~~Followup to #3525; can merge after that PR is.~~ I closed that PR, I think we should just merge this one.

Needs to be extensively tested so we don't break any existing notebooks / scripts.

Edit: This PR no longer auto-enables sample packing as it could slightly change the SFT training dynamics. Users must now pass packing=True into their SFTConfig objects.

djsaunde · 2025-11-20T15:58:08Z

I tested all the SFT notebooks on the main page README. I'd like to test a few more notebooks, including ones that we don't expect to use packing (multimodal training, RL training, ...).

djsaunde · 2025-11-20T17:53:24Z

Okay, I ran through all the notebooks displayed in the README. As expected, non of the non-text-only SFT notebooks utilized sample packing, and none of them hit any errors when switching to this branch.

for more information, see https://pre-commit.ci

…to pr/3566

for more information, see https://pre-commit.ci

…to pr/3566

Datta0 · 2025-12-09T06:36:52Z

unsloth/models/mistral.py

+    window_size = (-1, -1) if (kv_seq_len <= sw) else (sw, sw)

+    use_varlen = (
+        seq_info is not None and past_key_value is None and window_size == (-1, -1)


Q: So when past_key_value is not None aka decoding phase of generation, do we always use flash_attn_func and not flash_attn_varlen? Is that what this is trying to do? Does that place an implicit assumption that post prefill, inputs are padded to neat shapes?

correct! the flash attention varlen API doesn't support key/value caching, and we don't pack inputs during inference anyways, so we use the dense API. no such assumption AFAIK, not sure I follow you on that.

Datta0 · 2025-12-09T06:38:13Z

unsloth/utils/attention_dispatch.py

+    K: Tensor,
+    V: Tensor,
+) -> Tensor:
+    """Run attention using config / context info."""


Can we please add a couple of lines of info on which backend is used when and why?

Datta0 · 2025-12-09T06:49:50Z

unsloth/utils/attention_dispatch.py

+            )
+
+            if requires_grad:
+                K_mod = K_mod.reshape(bsz, kv_seq_len, n_heads, head_dim)


Where is the expansion happening? Otherwise reshaping to n_heads would not be possible right?

just before this, K_mod / V_mod are viewed as (bsz, kv_seq_len, n_kv_heads, 1, head_dim) and then expanded:

K_mod = K_t.view(bsz, kv_seq_len, config.n_kv_heads, 1, head_dim) V_mod = V_t.view(bsz, kv_seq_len, config.n_kv_heads, 1, head_dim) K_mod = K_mod.expand( bsz, kv_seq_len, config.n_kv_heads, config.n_groups, head_dim ) V_mod = V_mod.expand( bsz, kv_seq_len, config.n_kv_heads, config.n_groups, head_dim ) if requires_grad: K_mod = K_mod.reshape(bsz, kv_seq_len, n_heads, head_dim) V_mod = V_mod.reshape(bsz, kv_seq_len, n_heads, head_dim) else: Q_mod = Q_t.view( bsz, q_len, config.n_kv_heads, config.n_groups, head_dim )

Datta0 · 2025-12-09T06:52:59Z

unsloth/utils/attention_dispatch.py

+            attn_bias, XFORMERS_BLOCK_DIAG_CLS
+        )
+
+        if config.n_groups != 1 and not requires_grad and has_block:


NIT: Feel like we should condense these into the same check at the cost of repeating the code of single out = xformers_attn()...

I combined them but kept the single xformers_attn call, hopefully I understood you correctly.

unsloth/trainer.py

for more information, see https://pre-commit.ci

unsloth/trainer.py

djsaunde requested a review from danielhanchen November 7, 2025 19:03

djsaunde self-assigned this Nov 7, 2025

djsaunde removed the request for review from danielhanchen November 7, 2025 19:03

djsaunde changed the title ~~auto-enable sample packing~~ auto-enable SFT sample packing Nov 7, 2025

djsaunde force-pushed the auto-packing branch 2 times, most recently from 072de80 to efe4424 Compare November 10, 2025 20:23

djsaunde force-pushed the auto-packing branch 3 times, most recently from f352814 to 9cd0b26 Compare November 19, 2025 19:14

djsaunde mentioned this pull request Nov 20, 2025

Uncontaminated Sample Packing #3525

Closed

1 task

djsaunde marked this pull request as ready for review November 20, 2025 13:47

djsaunde force-pushed the auto-packing branch from 8f06935 to 50e5104 Compare November 20, 2025 14:06

djsaunde mentioned this pull request Nov 20, 2025

gemma3-270m: reduce batch size for sample packing unslothai/notebooks#135

Merged

djsaunde changed the title ~~auto-enable SFT sample packing~~ SFT sample packing Nov 21, 2025

djsaunde and others added 10 commits November 23, 2025 11:20

implement (sdpa, xformers, fa2) sample packing

7bd2558

attention dispatching

ebd9c77

ddp working OOTB with CLI

96b06fb

packed SWA and softcap support

cc62c07

enable batch flattening

a6d1fe2

LGPL license headers

7986a09

mask packed sequence boundaries

7e430cf

auto-enable sample packing

f54d7d3

[pre-commit.ci] auto fixes from pre-commit.com hooks

112351d

for more information, see https://pre-commit.ci

Add explicit toggle for sample packing

2b9f5a7

djsaunde force-pushed the auto-packing branch from 585e26f to e498912 Compare November 23, 2025 16:20

Add explicit toggle for sample packing

6c90169

djsaunde force-pushed the auto-packing branch from e498912 to 6c90169 Compare November 24, 2025 13:26

djsaunde requested a review from danielhanchen November 25, 2025 18:22

pre-commit-ci bot and others added 3 commits December 2, 2025 16:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

3955ad9

for more information, see https://pre-commit.ci

fix merge conflicts

0bc7261

[pre-commit.ci] auto fixes from pre-commit.com hooks

77eff05

for more information, see https://pre-commit.ci

djsaunde mentioned this pull request Dec 2, 2025

Auto-enable padding-free SFT #3672

Merged

danielhanchen and others added 9 commits December 7, 2025 23:56

Merge branch 'main' into pr/3566

22a3777

Merge branch 'auto-packing' of https://github.com/djsaunde/unsloth in…

ed96024

…to pr/3566

[pre-commit.ci] auto fixes from pre-commit.com hooks

50267c5

for more information, see https://pre-commit.ci

Add **kwargs

6dd1864

Merge branch 'auto-packing' of https://github.com/djsaunde/unsloth in…

265e831

…to pr/3566

Merge branch 'main' into pr/3566

6a09393

Merge branch 'main' into pr/3566

c3b4a04

Merge branch 'main' into pr/3566

da8c849

add back clobbered

4c46d96

Datta0 reviewed Dec 9, 2025

View reviewed changes

danielhanchen and others added 7 commits December 9, 2025 03:31

Merge branch 'main' into pr/3566

d110c0a

Update rope_embedding.py

db156b7

Update rope_embedding.py

2b41544

simplify trl warnings filter

66107e3

docstring

7960543

nit

6db6f43

bugfix

2f28474

danielhanchen reviewed Dec 10, 2025

View reviewed changes

unsloth/trainer.py Outdated Show resolved Hide resolved

danielhanchen and others added 2 commits December 9, 2025 17:11

Apply suggestion from @danielhanchen

b16ceae

[pre-commit.ci] auto fixes from pre-commit.com hooks

cbcbe36

for more information, see https://pre-commit.ci

danielhanchen requested changes Dec 10, 2025

View reviewed changes

unsloth/trainer.py Outdated Show resolved Hide resolved

unsloth/trainer.py Outdated Show resolved Hide resolved

unsloth/trainer.py Outdated Show resolved Hide resolved

unsloth/trainer.py Outdated Show resolved Hide resolved

danielhanchen added 4 commits December 9, 2025 17:14

Update unsloth/trainer.py

888209d

Update unsloth/trainer.py

6311a36

Update unsloth/trainer.py

ad9dfaf

Update unsloth/trainer.py

d4f423d

danielhanchen merged commit 50325e0 into unslothai:main Dec 10, 2025
1 check passed

Uh oh!

SFT sample packing #3566

SFT sample packing #3566

Uh oh!

Conversation

djsaunde commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djsaunde commented Nov 20, 2025

Uh oh!

djsaunde commented Nov 20, 2025

Uh oh!

Datta0 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

djsaunde Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

djsaunde Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

djsaunde Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

djsaunde commented Nov 7, 2025 •

edited

Loading