RFDETR CrowdHuman fine tuning

Hi!
I'm trying to fine tune a RFDETRSmall model on the CrowdHuman dataset that contains a bunch of overlapping boxes and a high density of detections in each image of the dataset.
The dataset is quite big (15000 images for training, ~3000 val, ~3000 test) and I'm trying to find the best parameters for fine tuning the model.
This is my simple current setup (RTX3090).

```
    # model
    model = RFDETRSmall(
        pretrain_weights="checkpoints/rf-detr-small-topdown-hearty-armadillo-5.pth"
    )

    model.train(
        dataset_dir=dataset_dir,
        epochs=30,
        batch_size=8,
        grad_accum_steps=4,
        lr=5e-5,
        patience=10,
        early_stopping=True,
        output_dir=output_dir,
        wandb=True,
    )
```

The final goal would be to distinguish hugged people for a security application. The pretrained RFDETRSmall often fails this task and classifies hugged people as a single blob, therefore I'm trying to get better results by training on a high density detection dataset.
The problem I'm facing is that the loss is basically flat and the cardinality error which is my most important metric here is never going down.
The following is a snippet of the logs of the last training experiment.

```
Epoch: [6]  [130/375]  eta: 0:02:57  lr: 0.000020  class_error: 0.00  loss: 5.8841 (6.2072)  loss_ce: 0.6394 (0.6561)  loss_bbox: 0.1946 (0.2135)  loss_giou: 0.5716 (0.6327)  loss_ce_0: 0.6761 (0.6853)  loss_bbox_0: 0.2103 (0.2254)  loss_giou_0: 0.5967 (0.6541)  loss_ce_1: 0.6477 (0.6630)  loss_bbox_1: 0.1962 (0.2170)  loss_giou_1: 0.5819 (0.6404)  loss_ce_enc: 0.6746 (0.6771)  loss_bbox_enc: 0.2334 (0.2524)  loss_giou_enc: 0.6250 (0.6903)  loss_ce_unscaled: 0.6394 (0.6561)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0389 (0.0427)  loss_giou_unscaled: 0.2858 (0.3163)  cardinality_error_unscaled: 24.5000 (27.5401)  loss_ce_0_unscaled: 0.6761 (0.6853)  loss_bbox_0_unscaled: 0.0421 (0.0451)  loss_giou_0_unscaled: 0.2983 (0.3271)  cardinality_error_0_unscaled: 24.5000 (27.5401)  loss_ce_1_unscaled: 0.6477 (0.6630)  loss_bbox_1_unscaled: 0.0392 (0.0434)  loss_giou_1_unscaled: 0.2909 (0.3202)  cardinality_error_1_unscaled: 24.5000 (27.5401)  loss_ce_enc_unscaled: 0.6746 (0.6771)  loss_bbox_enc_unscaled: 0.0467 (0.0505)  loss_giou_enc_unscaled: 0.3125 (0.3451)  cardinality_error_enc_unscaled: 24.5000 (27.5401)  time: 0.7002  data: 0.0175  max mem: 18269
2026-02-12 11:36:35

Epoch: [6]  [140/375]  eta: 0:02:47  lr: 0.000020  class_error: 0.00  loss: 5.8714 (6.1737)  loss_ce: 0.6487 (0.6544)  loss_bbox: 0.1979 (0.2141)  loss_giou: 0.5658 (0.6250)  loss_ce_0: 0.6803 (0.6841)  loss_bbox_0: 0.2095 (0.2260)  loss_giou_0: 0.5844 (0.6460)  loss_ce_1: 0.6527 (0.6615)  loss_bbox_1: 0.2055 (0.2178)  loss_giou_1: 0.5665 (0.6326)  loss_ce_enc: 0.6733 (0.6764)  loss_bbox_enc: 0.2305 (0.2533)  loss_giou_enc: 0.6226 (0.6825)  loss_ce_unscaled: 0.6487 (0.6544)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0396 (0.0428)  loss_giou_unscaled: 0.2829 (0.3125)  cardinality_error_unscaled: 22.5000 (26.9663)  loss_ce_0_unscaled: 0.6803 (0.6841)  loss_bbox_0_unscaled: 0.0419 (0.0452)  loss_giou_0_unscaled: 0.2922 (0.3230)  cardinality_error_0_unscaled: 22.5000 (26.9663)  loss_ce_1_unscaled: 0.6527 (0.6615)  loss_bbox_1_unscaled: 0.0411 (0.0436)  loss_giou_1_unscaled: 0.2833 (0.3163)  cardinality_error_1_unscaled: 22.5000 (26.9663)  loss_ce_enc_unscaled: 0.6733 (0.6764)  loss_bbox_enc_unscaled: 0.0461 (0.0507)  loss_giou_enc_unscaled: 0.3113 (0.3413)  cardinality_error_enc_unscaled: 22.5000 (26.9663)  time: 0.6730  data: 0.0167  max mem: 18269
```

Do you have any suggestion on how to proceed?
Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFDETR CrowdHuman fine tuning #674

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFDETR CrowdHuman fine tuning #674

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions