Saturday, April 11, 2009

LEEGLE - A challenge in object class recognition in sequences

LEEGLE is a challenge in object recognition and categorization initiated and organized by University of Ljubljana, Visual Cognitive Systems Laboratory, and mainly intended for students of computer vision. The challenge is organized twice a year, at the end of each semester. It is an opportunity for students to test their knowledge of computer vision and compete with teams from other universities.

Three months prior to the competition the training and validation data is made available. The training data consists of a set of 3D views of various toy objects pertaining to several object classes. A validation sequence simulates a trajectory of a camera through a toy world, depicting objects with high variations in scale, poses, orientation, lighting conditions, and some objects are likely to be occluded. The task is to detect, recognize and categorize as many of the known objects as possible. For the competition a test image sequence is given, where the algorithms developed are to be evaluated within 48 hours. Participating groups should submit their results, and resulting scores will be available on the web site.

The test sequence contains many objects in various poses, scales and orientations. The goal of the challenge is to detect, recognize and categorize as many of them as possible. For each challenge, a training set of images and a validation set with ground truth annotations is made available.

Once the test image sequence is given, the algorithms developed should be evaluated within 48 hours. For each test image the set of bounding boxes around the detected objects with the corresponding class labels should be submitted. The bounding box is determined by the upper left coordinate (X1, Y1) and the lower right coordinate (X2, Y2), where (0, 0) represents the upper left corner of the image. A detection will be counted as correct if the bounding box overlaps with the ground truth more than 40% and vice versa, and has a correct label. Since one object can belong to several categories, each correctly assigned class will be counted as a true positive (TP) and each wrong label will mean a false positive (FP). An exception is the 'zombie' class which should be avoided (it is a distractor object and should not be labeled). The ground truth bounding boxes outline only the visible part of the object. Similarly, your results shall report bounding boxes of the visibe part of the detected object (and not the dimensions inferred from the training set images), omitting thus the parts clipped by the image border or by other objects in the scene.

An example of an annotated scene image (ground truth):

The images on the simulated path will be captured from viewpoints with a constant height and tilt, both approximately the same as in the training sequence. The illumination, backgrounds, and the configuration of the scene will vary. Objects can be augmented with occluding parts (e.g. carrying tools), but the pose will be approximately the same as in the training set.

No comments: