Dear colleagues,

you are all invited to participate in our ROAD challenge @ ICCV 2021, part
of the Workshop of the same name.
The challenge is designed to assess video-level object, action and event
detection in a very challenging autonomous driving scenario.

Looking forward to your participation!

*The ROAD Challenge: Event Detection for Situation Awareness in Autonomous
Driving*

*Call for participation*

https://sites.google.com/view/roadchallangeiccv2021/challenge

*Aim of the Challenge*

The accurate detection and anticipation of actions performed by multiple
*road* *agents* (pedestrians, vehicles, cyclists and so on) is a crucial
task to address for enabling autonomous vehicles to make autonomous
decisions in a safe, reliable way. While the task of teaching an autonomous
vehicle how to drive can be tackled in a brute-force fashion through direct
reinforcement learning, a sensible and attractive alternative is to first
provide the vehicle with situation awareness capabilities, to then feed the
resulting semantically meaningful representations of road scenarios (in
terms of agents, events and scene configuration) to a suitable
decision-making strategy. In perspective, this has also the advantage of
allowing the modelling of the reasoning process of road agents in a
theory-of-mind approach, inspired by the behaviour of the human mind in
similar contexts.

Accordingly, the goal of this Challenge is to put to the forefront of the
research in autonomous driving the topic of *situation awareness*, intended
as the ability to create semantically useful representations of dynamic
road scenes, in terms of the notion of a *road event*.

*The ROAD dataset*

This concept is at the core of the new ROad event Awareness Dataset (ROAD)
for Autonomous Driving

https://github.com/gurkirt/road-dataset



ROAD is the first benchmark of its kind, a multi-label dataset designed to
allow the community to investigate the use of semantically meaningful
representations of dynamic road scenes to facilitate situation awareness
and decision making. It contains 22 long-duration videos (ca 8 minutes
each) annotated in terms of “road events”, defined as triplets of Agent,
Action and Location labels and represented as ‘tubes’, i.e., series of
frame-wise bounding box detections.



ROAD is a large, high-quality benchmark comprising 122K labelled video
frames and 560K detection bounding boxes associated with 1.7M labels.



The above GitHub repository contains all the necessary instructions to
pre-process the 22 ROAD videos, unpack them to the correct directory
structure and run the provided baseline model.



*Tasks and Challenges*

ROAD allows one to validate detection tasks associated with any meaningful
combination of the three base labels. For this Challenge we consider three
*video-level* detection Tasks:

T1. *Agent* detection, in which the output is in the form of agent tubes
collecting the bounding boxes associated with an active road agent in
consecutive frames.

T2. *Action* detection, where the output is in the form of action tubes
formed by bounding boxes around an action of interest in each video frame.

T3. *Road event* detection, where by road event we mean a triplet (Agent,
Action, Location) as explained above, once again represented as a tube of
frame-level detections.

Each Task thus consists in regressing whole series (‘tubes’) of
temporally-linked bounding boxes associated with relevant instances,
together with their class label(s).

*Baseline*

As a baseline for all three detection tasks we propose a simple yet
effective 3D feature pyramid network with focal loss, an architecture we
call 3D-RetinaNet:

http://arxiv.org/abs/2102.11585

The code is publicly available on GitHub:

https://github.com/gurkirt/3D-RetinaNet

*Timeframe*

Challenge participants have 18 videos at their disposal for training and
validation. The remaining 4 videos are to be used to test the final
performance of their model. This will apply to all three Tasks.

The timeframe for the Challenge is as follows:

·        Training and validation fold release: April 30 2021

·        Test fold release: July 20 2021

·        Submission of results: August 10 2021

·        Announcement of results: August 12 2021

·        Challenge event @ workshop: October 10-17 2021

*Evaluation*

Performance in each task is measured by video mean average precision
(video-mAP), with an Intersection over Union (IoU) detection threshold set
to 0.1, 0.2 and 0.5 (signifying a 10%, 20% and 50% overlap between
predicted and true bounding box within each tube), because of the
challenging nature of the data. The final performance of each task will be
determined by the equally-weighted average of the performances at the three
thresholds.

In the first stage of the Challenge participants will, for each task,
submit their predictions as generated on the validation fold and get the
evaluation metric in return, in order to get a feel of how well their
method(s) work. In the second stage they will submit the predictions
generated on the test fold which will be used for the final ranking.

A separate ranking will be produced for each of the Tasks.

Evaluation will take place on the EvalAI platform.

https://eval.ai/web/challenges/challenge-page/1059

For each Challenge stage and each Task the maximum number of submissions is
capped at 50, with an additional constraint of 5 submissions per day.

Detailed instructions about how to download the data and submit your
predictions for evaluation at both validation and test time, for all three
Tasks, are provided on the Challenge website.
_______________________________________________
uai mailing list
uai@engr.orst.edu
https://it.engineering.oregonstate.edu/mailman/listinfo/uai

Reply via email to