CombOOD Near-OOD results on ImageNet-1K / ImageNet-200 leaderboard appear incorrect

First, thank you to the OpenOOD team for maintaining this benchmark — it has been invaluable for the OOD detection community.

We would like to request a kind inspection of possibly incorrect results on the leaderboard.
While benchmarking for research purpose, we found a significant discrepancy in the reported Near-OOD results.
We reimplemented CombOOD as an OpenOOD postprocessor following the [official codebase]
(https://github.com/rmagesh148/combood) and evaluated across leaderboard benchmarks using the v1.5 evaluation pipeline.

## Summary

CIFAR-10 and CIFAR-100 results match the leaderboard within ~2 points, confirming our implementation is sound.
However, the problem is limited to Near-OOD for **ImageNet-1K** and **ImageNet-200** numbers diverge by ~13–22 points, while Far-OOD agrees within ~1–3 points.
One possible explanation is that the leaderboard entries were computed using the v1.0 Near-OOD split rather than the v1.5 split (SSB-hard, NINCO). We note that the official CombOOD codebase defines `nearood: [species, inaturalist, openimageo, imageneto]` and does not reference SSB-hard or NINCO, which may be related ([issue #235](https://github.com/Jingkang50/OpenOOD/issues/235)).

## Results — ImageNet-1K (ResNet50, torchvision pretrained)

### OOD

| Dataset       | Leaderboard AUROC (issue#235) | Our Reproduced AUROC |
|---------------|------------------:|----------:|
| SSB-hard      |             92.62 |     64.94 |
| NINCO         |             97.82 |     80.92 |
| **Near-OOD**  |         **95.22** | **72.93** |
| iNaturalist   |             87.13 |     85.16 |
| Textures      |             97.01 |     96.78 |
| OpenImage-O   |             86.59 |     86.32 |
| **Far-OOD**   |         **90.24** | **89.42** |


## Results — ImageNet-200 (ResNet18, 3-seed average)

### OOD

| Dataset       | Leaderboard AUROC(issue#235) | Our Reproduced AUROC    |
|---------------|------------------:|-------------:|
| SSB-hard      |             93.66 | 78.09 ± 0.10 |
| NINCO         |             97.81 | 86.60 ± 0.20 |
| **Near-OOD**  |    **95.74 ± 0.00** | **82.35 ± 0.10** |
| iNaturalist   |             92.22 | 92.02 ± 0.50 |
| Textures      |             96.18 | 95.75 ± 0.10 |
| OpenImage-O   |             89.31 | 88.98 ± 0.40 |
| **Far-OOD**   |    **92.57 ± 0.00** | **92.25 ± 0.30** |

## Results — CIFAR-10 (ResNet18, 3-seed average)

| Split         | Leaderboard AUROC | Our Reproduced AUROC    |
|---------------|------------------:|-------------:|
| **Near-OOD**  |    **91.13 ± 0.00** | **90.81 ± 0.20** |
| **Far-OOD**   |    **94.65 ± 0.00** | **93.26 ± 0.20** |

## Results — CIFAR-100 (ResNet18, 3-seed average)

| Split         | Leaderboard AUROC | Our Reproduced AUROC    |
|---------------|------------------:|-------------:|
| **Near-OOD**  |    **78.77 ± 0.00** | **80.78 ± 0.10** |
| **Far-OOD**   |    **85.87 ± 0.00** | **81.74 ± 0.20** |

## Observations

1. **CIFAR-10/100**: Our reproduction is within ~1–4 points of the leaderboard, validating the implementation.
2. **ImageNet-1K Near-OOD**: 22-point gap (95.22 vs 72.93). Far-OOD agrees within ~1 point (90.24 vs 89.42).
3. **ImageNet-200 Near-OOD**: 13-point gap (95.74 vs 82.35). Far-OOD agrees within ~0.3 points (92.57 vs 92.25).
4. The pattern — Near-OOD diverges while Far-OOD matches

Could the ImageNet-1K and ImageNet-200 Near-OOD entries be inspected? 
We are happy to share our reproduction code and OpenOOD postprocessor implementation.


Split	Leaderboard AUROC	Our Reproduced AUROC
Near-OOD	91.13 ± 0.00	90.81 ± 0.20
Far-OOD	94.65 ± 0.00	93.26 ± 0.20

Split	Leaderboard AUROC	Our Reproduced AUROC
Near-OOD	78.77 ± 0.00	80.78 ± 0.10
Far-OOD	85.87 ± 0.00	81.74 ± 0.20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CombOOD Near-OOD results on ImageNet-1K / ImageNet-200 leaderboard appear incorrect #307

Summary

Results — ImageNet-1K (ResNet50, torchvision pretrained)

OOD

Results — ImageNet-200 (ResNet18, 3-seed average)

OOD

Results — CIFAR-10 (ResNet18, 3-seed average)

Results — CIFAR-100 (ResNet18, 3-seed average)

Observations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dataset	Leaderboard AUROC (issue#235)	Our Reproduced AUROC
SSB-hard	92.62	64.94
NINCO	97.82	80.92
Near-OOD	95.22	72.93
iNaturalist	87.13	85.16
Textures	97.01	96.78
OpenImage-O	86.59	86.32
Far-OOD	90.24	89.42

Dataset	Leaderboard AUROC(issue#235)	Our Reproduced AUROC
SSB-hard	93.66	78.09 ± 0.10
NINCO	97.81	86.60 ± 0.20
Near-OOD	95.74 ± 0.00	82.35 ± 0.10
iNaturalist	92.22	92.02 ± 0.50
Textures	96.18	95.75 ± 0.10
OpenImage-O	89.31	88.98 ± 0.40
Far-OOD	92.57 ± 0.00	92.25 ± 0.30

CombOOD Near-OOD results on ImageNet-1K / ImageNet-200 leaderboard appear incorrect #307

Description

Summary

Results — ImageNet-1K (ResNet50, torchvision pretrained)

OOD

Results — ImageNet-200 (ResNet18, 3-seed average)

OOD

Results — CIFAR-10 (ResNet18, 3-seed average)

Results — CIFAR-100 (ResNet18, 3-seed average)

Observations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions