I am looking into how RoPE positions are handled. I noticed that the models support a positions field:
|
def forward( |
|
self, |
|
x: torch.Tensor, |
|
freqs_cis: torch.Tensor, |
|
attention_masks: AttentionMasksType | None, |
|
positions: torch.Tensor | None = None, |
|
): |
However, when examining the dataloader and dataset implementation, it appears that positions is always None.
If document packing is used, this could lead to misaligned RoPE embeddings across documents. Specifically, the first token of a packed document might not receive the first positional embedding, because the position index continues from the previous document in the packed sequence. As a result, the RoPE embedding applied to the first token of a document may not correspond to position 0 of that document.
Is this expected behavior, or should positions be reset per document when packing is enabled?
I am looking into how RoPE positions are handled. I noticed that the models support a
positionsfield:torchtitan/torchtitan/models/qwen3/model.py
Lines 57 to 63 in 5732118
However, when examining the dataloader and dataset implementation, it appears that
positionsis alwaysNone.If document packing is used, this could lead to misaligned RoPE embeddings across documents. Specifically, the first token of a packed document might not receive the first positional embedding, because the position index continues from the previous document in the packed sequence. As a result, the RoPE embedding applied to the first token of a document may not correspond to position 0 of that document.
Is this expected behavior, or should positions be reset per document when packing is enabled?