Skip to content

RoPE positions are never set #2559

@francesco-bertolotti

Description

@francesco-bertolotti

I am looking into how RoPE positions are handled. I noticed that the models support a positions field:

def forward(
self,
x: torch.Tensor,
freqs_cis: torch.Tensor,
attention_masks: AttentionMasksType | None,
positions: torch.Tensor | None = None,
):

However, when examining the dataloader and dataset implementation, it appears that positions is always None.

If document packing is used, this could lead to misaligned RoPE embeddings across documents. Specifically, the first token of a packed document might not receive the first positional embedding, because the position index continues from the previous document in the packed sequence. As a result, the RoPE embedding applied to the first token of a document may not correspond to position 0 of that document.

Is this expected behavior, or should positions be reset per document when packing is enabled?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions