Overview: Our workshop will be featuring two challenges: SSB and UNICORN. The two challenges are hosted on CodaLab. The challenge tracks are described below.

Semantic Shifts Benchmark

SSB
The Semantic Shift Benchmark (SSB) challenge focuses on open-set recognition and the generalized category discovery problem. The SSB benchmark can be accessed from [here] or [here]. For an understanding of the generalized category discovery problem, the readers can refer to the works [here] and [here].
Track-1: Open-Set Recognition: This track evaluates the ability of the model to identify open-set examples. This track has only one leaderboard, only models that are not trained on ImageNet-22k can be submitted. The ranking will be determined based on the average score of FPR and AUROC. A baseline is provided [here].
Track-2: Generalized Category Discovery: This track evaluates the ability of the model to discover and recognize novel concepts within an unlabeled dataset. This track has two leaderboard, one for self-supervised pretrained models on ImageNet-1k, one for any self-supervised pretrained models. The ranking will be determined based on the average clustering accuracy on all three datasets in the FGVC dataset from SSB benchmark. We provide a baseline for GCD [here].

The UNICORN Benchmark

UNICORN The UNICORN benchmark focuses on the robostness and safety evaluation of multi-modal large language models. The benchmark can be access from [here].
The UNICORN challenge has only one track, evaluting the model on both the OODCV-VQA benchmark and also the SketchyQA benchmark. The ranking will be determined by the average score of the two benchmarks. For the OOD evaluation, we have two novel VQA datasets --- OODCV-VQA and Sketchy-VQA, each with one variant (OODCV-Counterfactual with perturbated textual questions and Sketchy-Challenging with rarely seen sketchy objects), designed to test model performance under challenging conditions. In the OOD scenario, all questions are matched with boolean or numerical answers, and we use exact match accuracy for evaluation. The evaluation of various types of models on this benchmark can be found [here].

Challenge Servers


Track Links
SSB: Open-Set Recognition [Codalab]
SSB: Generalized Category Discovery [Codalab]
UNICORN [Codalab]

Important Dates

Description Date
Challenge starts June 15th, 2024
Challenge ends August 25th, 2024
Challenge report and code deadline September 15th, 2024