Tuesday, May 29, 2012


Artivle from http://kiwi.media.mit.edu/tether/

Authors: Matthew Blackshaw (@mblackshaw), Dávid Lakatos (@dogichow), Hiroshi Ishii, Ken Perlin

For more information please contact tether@media.mit.edu.

T(ether) is a novel spatially aware display that supports intuitive interaction with volumetric data. The display acts as a window affording users a perspective view of three- dimensional data through tracking of head position and orientation. T(ether) creates a 1:1 mapping between real and virtual coordinate space allowing immersive exploration of the joint domain. Our system creates a shared workspace in which co-located or remote users can collaborate in both the real and virtual worlds. The system allows input through capacitive touch on the display and a motion-tracked glove. When placed behind the display, the user’s hand extends into the virtual world, enabling the user to interact with objects directly.

Above: The environment can be spatially annotated using a tablet's touch screen.

Above: T(ether) is collaborative. Multiple people can edit the same virtual environment.


T(ether) uses Vicon motion capture cameras to track the position and orientation of tablets, user heads and hands. Server-side synchronization was coded using NodeJS and tablet-side code uses Cinder. The synchronization server forwards tag location to each of the tablets over wifi, which in turn renders the scene. Touch events on each tablet are broadcasted to all other tablets using the synchronization server.

Press Kit

via congeo

Monday, May 28, 2012


Στα πλαίσια του ερευνητικού προγράμματος ΘΑΛΗΣ, η Ομάδα Επεξεργασίας Εικόνας και Πολυμέσων (http://ipml.ee.duth.gr/) του Τμήματος Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Δημοκριτείου Πανεπιστημίου Θράκης, θα διερευνήσει νέες μεθοδολογίες για την αποτελεσματική ανάκτηση τριδιάστατων (3Δ) αντικειμένων (στατικών και 3Δ βίντεο). Χαρακτηριστικά δείγματα σχετικών ερευνητικών αποτελεσμάτων της παραπάνω ομάδας βρίσκονται στο σύνδεσμο http://utopia.duth.gr/~ipratika/3dor/

Σε αυτό το πρόγραμμα υπάρχει μία (1) χρηματοδοτούμενη θέση για εκπόνηση διδακτορικής διατριβής.

Η έναρξη εκπόνησης της διατριβής αφορά στο ακαδημαϊκό έτος 2012-13. Ενθαρρύνονται να εκδηλώσουν ενδιαφέρον υποψήφιοι οι οποίοι διαθέτουν τα απαιτούμενα προσόντα και έχουν ολοκληρώσει τις βασικές ή μεταπτυχιακές σπουδές τους το αργότερο μέχρι τον ερχόμενο Σεπτέμβρη.

Είναι επιθυμητό οι υποψήφιοι να έχουν υπόβαθρο σε τουλάχιστον κάποιο από τα θέματα που αφορούν σε Γραφικά, Επεξεργασία Εικόνας, Οραση Υπολογιστή και Αναγνώριση Προτύπων καθώς και να έχουν εμπειρία στον προγραμματισμό είτε σε περιβάλλον Matlab είτε σε οποιοδήποτε άλλο περιβάλλον (πχ. Visual Studio, κτλ.) με γλώσσες προγραμματισμού όπως C/C++/C#.

Οι υποψήφιοι που θα επιλεγούν θα δουλέψουν σε ένα δυναμικό περιβάλλον και θα συνεργασθούν με ερευνητές ιδρυμάτων του εσωτερικού (Εθνικό Καποδιστριακό Πανεπιστήμιο Αθηνών – Τμήμα Πληροφορικής, Ε.Κ. «ΑΘΗΝΑ», ΕΚΕΦΕ «ΔΗΜΟΚΡΙΤΟΣ» - Ινστιτούτο Πληροφορικής και Τηλεπικοινωνιών) καθώς και με ερευνητές ιδρυμάτων του εξωτερικού (University of Houston – USA, Vrije Universiteit Brussel – Belgium, Utrecht University - Netherlands, Consiglio Nazionale delle Ricerche - Italy).

Οι ενδιαφερόμενοι παρακαλούνται να επικοινωνήσουν άμεσα με :

Επ. Καθηγητή Ιωάννη Πρατικάκη (http://utopia.duth.gr/~ipratika/)

Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Δημοκρίτειο Πανεπιστήμιο Θράκης

Γραφείο 1.15, Κτίριο Β’

Πανεπιστημιούπολη, Κιμμέρια, Ξάνθη

VIR tutorial by Oge Marques featuring Lire @ SIGIR 2012

Article from:http://www.semanticmetadata.net/2012/05/16/vir-tutorial-by-oge-marques-featuring-lire-sigir-2012/

Dr. Oge MarquesDr. Oge Marques, author of the book Practical Image and Video Processing Using MATLAB is giving a tutorial on Java based visual information retrieval at SIGIR 2012. Oge Marques is Associate Professor in the Department of Computer & Electrical Engineering and Computer Science at Florida Atlantic University. He has been teaching and doing research on image and video processing for more than twenty years, in seven different countries.

In his tutorial, he presents an overview of visual information retrieval (VIR) concepts, techniques, algorithms, and applications. Several topics are supported by examples written in Java, using Lucene (an open-source Java-based indexing and search implementation) and LIRE (Lucene Image REtrieval), an open-source Java-based library for content-based image retrieval (CBIR) .

Read more & register on the SIGIR 2012 web page (as soon as it is updated).

Friday, May 25, 2012

Microsoft Released Face Tracking SDK in Kinect for Windows

The Microsoft's revolutionary hardware, the Microsoft Kinect, is getting a new piece of brain. Microsoft just released Face Tracking SDK in Kinect For Windows. It can be used for 3D face tracking. It supports most facial types and works in real-time.

You can use the Face Tracking SDK in your program if you install Kinect for Windows Developer Toolkit 1.5. You need to have Kinect camera attached to your PC. The face tracking engine tracks at the speed of 4-8 ms per frame depending on how powerful your PC is.

Take a look at the following demo which shows its facial tracking capabilities, range of supported motions, real-time tracking speed and robustness to occlusions.

Here are several things that will affect tracking accuracy, provided by Nikolai Smolynskiy.

1) Light – a face should be well lit without too many harsh shadows on it. Bright backlight or sidelight may make tracking worse.

2) Distance to the Kinect camera – the closer you are to the camera the better it will track. The tracking quality is best when you are closer than 1.5 meters (4.9 feet) to the camera. At closer range Kinect’s depth data is more precise and so the face tracking engine can compute face 3D points more accurately.

3) Occlusions – if you have thick glasses or Lincoln like beard, you may have issues with the face tracking. This is still an open area for improvement. Face color is NOT an issue.

The Face Tracking SDK is based on the Active Apperance Model (See Wikipedia explanation for AAM). It also utilizes Kinect’s depth data, so it can track faces/heads in 3D. More technical publications You can be found in the following publications:

  • Iain Matthews and Simon Baker, "Active Appearance Models Revisited," International Journal of Computer Vision, Vol. 60, No. 2, November, 2004, pp. 135 - 164. pdf
  • Zhou, M., Liang, L., J. S. & Wang, Y. "AAM based face tracking with temporal matching and face segmentation,"IEEE CVPR, 2010, 701-708. pdf

To download the SDK visit here.

Wednesday, May 23, 2012


Leap represents an entirely new way to interact with your computers. It’s more accurate than a mouse, as reliable as a keyboard and more sensitive than a touchscreen.  For the first time, you can control a computer in three dimensions with your natural hand and finger movements.

This isn’t a game system that roughly maps your hand movements.  The Leap technology is 200 times more accurate than anything else on the market — at any price point. Just about the size of a flash drive, the Leap can distinguish your individual fingers and track your movements down to a 1/100th of a millimeter.

This is like day one of the mouse.  Except, no one needs an instruction manual for their hands



Tuesday, May 22, 2012

[New Paper] Dynamic two-stage image retrieval from large multimedia databases

Avi Arampatzis | Konstantinos Zagoris | Savvas A. Chatzichristofis

Information Processing & Management

Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimedia databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary media. We perform retrieval in a two-stage fashion: first rank by a secondary medium, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed.

Full-size image

Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that our dynamic two-stage method can be significantly more effective and robust than similar setups with static thresholds previously proposed. In additional experiments using local feature derivatives in the visual stage instead of global, such as the emerging visual codebook approach, we find that two-stage does not work very well. We attribute the weaker performance of the visual codebook to the enhanced visual diversity produced by the textual stage which diminishes codebook’s advantage over global features. Furthermore, we compare dynamic two-stage retrieval to traditional score-based fusion of results retrieved visually and textually. We find that fusion is also significantly more effective than single-medium baselines. Although, there is no clear winner between two-stage and fusion, the methods exhibit different robustness features; nevertheless, two-stage retrieval provides efficiency benefits over fusion.



Swarmanoid, The Movie receives the AAAI-2011 Best Video Award at the San Francisco annual event!

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm

Nature 485, 372–375 (17 May 2012)


Paralysis following spinal cord injury, brainstem stroke, amyotrophic lateral sclerosis and other disorders can disconnect the brain from the body, eliminating the ability to perform volitional movements. A neural interface system could restore mobility and independence for people with paralysis by translating neuronal activity directly into control signals for assistive devices. We have previously shown that people with long-standing tetraplegia can use a neural interface system to move and click a computer cursor and to control physical devices6, 7, 8. Able-bodied monkeys have used a neural interface system to control a robotic arm9, but it is unknown whether people with profound upper extremity paralysis or limb loss could use cortical neuronal ensemble signals to direct useful arm actions. Here we demonstrate the ability of two people with long-standing tetraplegia to use neural interface system-based control of a robotic arm to perform three-dimensional reach and grasp movements. Participants controlled the arm and hand over a broad space without explicit training, using signals decoded from a small, local population of motor cortex (MI) neurons recorded from a 96-channel microelectrode array. One of the study participants, implanted with the sensor 5 years earlier, also used a robotic arm to drink coffee from a bottle. Although robotic reach and grasp actions were not as fast or accurate as those of an able-bodied person, our results demonstrate the feasibility for people with tetraplegia, years after injury to the central nervous system, to recreate useful multidimensional control of complex devices directly from a small sample of neural signals.


Tuesday, May 15, 2012

Document Recognition and Retrieval XX (2013)

Document Recognition and Retrieval XX (2013), http://www.cs.rit.edu/~drr2013

San Francisco, Feb. 5-7, 2013

Paper Submission Deadline: July 23, 2012 (11:59 PST)

Document Recognition and Retrieval (DRR)is one of the leading international conferences devoted to current research in document analysis, recognition and retrieval. The 20th Document Recognition and Retrieval Conference is being held as part of SPIE Electronic Imaging, from Feb. 5-7, 2013 in San Francisco, California, USA.

One keynote speaker has been confirmed, Ray Smith of Google Research.

Ray will be presenting on the development of the widely used open source Tesseract OCR engine, relating this to changes in document recognition systems since the first DRR was held in 1994.

The Conference Chairs and Program Committee invite all researchers working on document recognition and retrieval to submit original research papers. Papers are presented in oral and poster sessions at the conference, along with invited talks by leading researchers. Accepted papers will be published by the SPIE in the conference proceedings. At the conference a Best Student Paper Award will be presented.

Papers are solicited in, but not limited to, the areas below.

Document Recognition

  • Text recognition:machine-printed, handwritten documents; paper, tablet, camera, and video sources
  • Writer/style identification, verification, and adaptation
  • Graphics recognition:vectorization (e.g. for line-art, maps and technical drawings), signature, logo and graphical symbol recognition, figure, chart and graph recognition, and diagrammatic notations (e.g. music, mathematical notation)
  • Document layout analysis and understanding:document and page region segmentation, form and table recognition, and document understanding through combined modalities (e.g. speech and images)
  • Evaluation:performance metrics, and document degradation models
  • Additional topics:document image filtering, enhancement and compression, document clustering and classification, machine learning (e.g. integration and optimization of recognition modules), historical and degraded document images (e.g. fax), multilingual document recognition, and web page analysis (including wikis and blogs)

Document Retrieval

  • Indexing and Summarization:text documents (messages, blogs, etc.), imaged documents, entity tagging from OCR output, and text categorization
  • Query Languages and Modalities:Content-Based Image Retrieval (CBIR) for documents, keyword spotting, non-textual query-by-example (e.g. tables, figures, math), querying by document geometry and/or logical structure, approximate string matching algorithms for OCR output, retrieval of noisy text documents (messages, blogs, etc.), cross and multi-lingual retrieval
  • Evaluation:relevance and performance metrics, evaluation protocols, and benchmarking
  • Additional topics:relevance feedback, impact of recognition accuracy on retrieval performance, and digital libraries including systems engineering and quality assurance

Important Dates

  • 23 July, 2012: Paper submission deadline
  • Late August, 2012: Author notifications
  • 26 November, 2012: Final paper submission deadline
  • 5-7 February, 2013: Conference dates

*Paper Submission

All paper submissions should be between 8-12 pages in length, using the SPIE LaTeX template (available from conference web pages). For accepted papers, final submissions will also be 8-12 pages in the same format. Papers should clearly identify the problem addressed in the paper, identify the original contribution(s) of the paper, relate the paper to previous work, and provide experimental and/or theoretical evaluation as appropriate. Submissions should be uploaded through the conference web site (http://www.cs.rit.edu/~drr2013/submission.html).

IEEE International Symposium on Multimedia 2012 (ISM2012)

Irvine, CA, USA, December 10-12, 2012


The IEEE International Symposium on Multimedia (ISM2012) is an international forum for researchers to exchange information regarding advances in the state-of-the-art and practice of multimedia computing, as well as to identify emerging research topics and define the future of multimedia computing. The technical program of ISM2012 will consist of invited talks, paper presentations, demonstrations and panel discussions.

Please refer to the conference website for further information:



  • Jun 8th, 2012: Panel Proposal Submission
  • Jul 8th, 2012: Regular & Short Paper Submission
  • Jul 8th, 2012: Industry Paper Submission
  • Jul 22nd, 2012: Demo Proposal Submission
  • Jul 22nd, 2012: PhD Workshop Paper Submission
  • Aug 24th, 2012: Panel Notification
  • Aug 24th, 2012: Paper and Demo Notification


Authors are invited to submit Regular Papers (8-page technical paper), Short Papers (4-page technical paper), Demonstration Papers and Posters (2 page technical paper), PhD Workshop Papers (2 pages), and Workshop Proposals as well as Industry Track Papers (8-page technical paper) which will be included in the proceedings. A main goal of this program is to present research work that exposes the academic and research communities to challenges and issues important for the industry. More information is available on the ISM2012 web page. The Conference Proceedings will be published by IEEE Computer Society Press. Distinguished quality papers presented at the conference will be selected for publication in internationally renowned journals, among them the IEEE Transactions on Multimedia.

AREAS OF INTEREST INCLUDE (but are not limited to):

*Multimedia Systems and Architectures

Architecture and applications, GPU-based architectures and systems, mobile multimedia systems and services, pervasive and interactive multimedia systems including mobile systems, pervasive gaming, and digital TV, multimedia/HD display systems, multimedia in the Cloud, software development using multimedia techniques.

*Multimedia Interfaces

Multimedia information visualization, interactive systems, multimodal interaction, including human factors, multimodal user interfaces: design, engineering, modality-abstractions, etc., multimedia tools for authoring, analyzing, editing, browsing, and navigation, novel interfaces for multimedia etc.

*Multimedia Coding, Processing, and Quality Measurement

Audio, video, image processing, and coding, coding standards, audio, video, and image compression algorithms and performance, scalable coding, multiview coding, 3D/multi-view synthesis, rendering, animation coding, noise removal techniques from multimedia, panorama, multi-resolution or superresolution algorithms, etc.

*Multimedia Content Understanding, Modeling, Management, and Retrieval

Multimedia meta-modeling techniques and operating systems, computational intelligence, vision, storage/archive systems, databases, and retrieval, multimedia/video/audio segmentation, etc.

*Multimedia Communications and Streaming

Multimedia networking and QoS, synchronization, HD video streaming, mobile audio/video streaming, wireless, scalable streaming, P2P multimedia streaming, multimedia sensor networks, internet telephony, hypermedia systems, etc.

*Multimedia Security

Multimedia security including digital watermark and encryption, copyright issues, surveillance and monitoring, face detection & recognition algorithms, human behavior analysis, multimedia forensics, etc.

*Multimedia Applications

3D multimedia: graphics, displays, sound, broadcasting, interfaces, multimedia composition and production, gaming, virtual and augmented reality, applications for mobile systems, multimedia in social network analysis:

YouTube, Flickr, Twitter, Facebook, Google+, etc., elearning, etc.

Friday, May 4, 2012

sFly Quadrotors Navigate Outdoors All By Themselves

Article from IEEE Spectrum

Quadrotors are famous for being able to pull all sorts of crazy stunts, but inevitably, somewhere in the background of the amazing video footage of said crazy stunts you'll notice the baleful red glow of a Vicon motion tracking system. Now, we don't want to call this cheating or anything, but we're certainly looking forward to the day when quadrotors can do this outside of a lab, and the sFly project is helping to make this happen.

What makes the sFly project, led by ETH Zurich's Autonomous Systems Lab, different is that the sFly quadrotors don't rely on motion capture systems. They also don't rely on GPS, remote control, radio beacons, laser rangefinders, frantically waving undergrads, or anything else. The only thing that sFly has to go on is an IMU and an onboard camera (and an integrated computer), but using just those systems (and a "very efficient onboard inertial-aided visual simultaneous localization and mapping algorithm"), sFly is capable of navigating all by itself. And if you have a fleet of sFly quadrotors, you can use them to make cooperative 3D maps of the environment:

Each quadrotor is completely autonomous, but they're also equipped with two extra cameras that stream stereo imagery back to a central computer over GSM or Wi-Fi that takes the data from several quadrotors and combines it into an overall 3D model of the environment as a whole. Then, the computer can guide each robot to an optimal surveillance site. The idea here is that you'd be able to rapidly deploy an sFly system with a swarm autonomous quadrotors in a disaster area or somewhere else without any infrastructure (or even a GPS signal) and still be able to take advantage of some clever autonomous aerial mapping.

(please note: video is also available on 3D)

Article from IEEE Spectrum

ICPR12 contest on Kinect-based gesture recognition

The ICPR kinect-based gesture recognition challenge opens on May 7 (cash prizes & more). See URL and below:


ChaLearn takes gesture recognition to the crowd with Microsoft Kinect(TM)

A competition to help improve the accuracy of gesture recognition using Microsoft Kinect(TM) motion sensor technology promises to take man-machine interfaces to a whole new level. From controlling the lights or thermostat in your home to flicking channels on the TV, all it will take is a simple wave of the hand. And the same technology may even make it possible to automatically detect more complex human behaviors, to allow surveillance systems to sound an alarm when someone is acting suspiciously, for example, or to send help whenever a bedridden patient shows signs of distress.

Through its low cost 3D depth-sensing cameras, Microsoft Kinect(TM) has already kick-started this revolution by bringing gesture recognition into the home. Humans can recognize new gestures after seeing just one example (one-shot-learning). With computers though, recognizing even well-defined gestures, such as sign language, is much more challenging and has traditionally required thousands of training examples to teach the software.

To see what the machines are capable of, ChaLearn launched a competition hosted by Kaggle with prizes donated by Microsoft, in the hope they can give the state of the art a rapid boost. The ChaLearn team has been organizing competitions since 2003, featuring hard problems such as discovering cause-effect relationships in data. It has selected the young and dynamic startup Kaggle to host the gesture challenge because Kaggle has very rapidly established a track record for using crowdsourcing to find solutions that outperform state-of-the- art algorithms and predictive models in a wide variety of domains (from helping NASA build algorithms to map dark matter to helping insurance companies improves claims prediction). And now the first round of the gesture challenge helped narrow down the gap between machine and human performance. Over a period of four months starting in December 2011, 153 contestants making 573 entries have built software systems that are capable of learning from a single training example of a hand gesture (so-called one-shot-learning). They lowered the error rate, starting from a baseline method making more than 50% error to less than 10% error.

The winner of the challenge, Alfonso Nieto Castanon, used a method he invented, which is inspired by the human vision system. He and the second and third place winners will be awarded $5000, $3000 and $2000 respectively and get an opportunity to present their results in front of an audience of experts at the CVPR 2012 conference in Rhode Island, USA, in June. A demonstration competition of gesture recognition systems using Kinect(TM) will also be held in conjunction with this event, with similar prizes donated by Microsoft.

Now, from May 7 and until September 10, new competitors can enter round 2 of the challenge and get a chance to close the gap with human performance, which is under 2% error! The entrants are given a set of examples with which to apply and test their algorithms, so that they may improve them. Compared to round 1, they will benefit from a wealth of resources including the fact sheets and published papers of the participants of round 1, data annotations, and data transformations having had success in round 1. During a four month period they will be able to compare their system with those of other contestants, by using it to predict gestures from a feedback sample. Throughout the competition the evaluations of these are posted on a live leaderboard, so participants can monitor their performance in real time. The contestants will then have the opportunity to put their best algorithms to the final test in an evaluation phase. Here they will be given a few days to train their system on an entirely new set of gestures, after which the one with the best recognition score will be rewarded with $5000. Those coming second and third place will receive

$3000 and $2000 respectively. Similarly as in round 1, the results will be discussed at a scientific conference (ICPR 2012, Tsukuba, Japan, November 2012) where a demonstration competition will be held also crowned with prizes in the same amount. Microsoft will be evaluating successful participants in all challenge rounds for two potential IP agreements of $100,000 each. See official challenge rules for more details at http://gesture.chalearn.org.

The winner of the first round believes that it is possible to reach and even beat human performance. Others will also join in the race.

According to Kaggle, that is the power of the crowd: bringing together expert talent, sometimes from previously untapped quarters. And with Microsoft interested in buying the intellectual property, the hope is that the new algorithms that emerge from the contest will not only boost accuracy but also open the doors to a whole new range of applications. From using communicating with Kinect(TM) through sign language or even speaking, with the algorithms interpreting what you say by reading your lips to smart homes or using gestures to control surgical robots.

The challenge was initiated by the US Defense Advanced Research Projects Agency (DARPA) Deep Learning Program and is supported by the US National Science Foundation, the European Pascal2 network of excellence, Microsoft and Texas Instruments. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors and funding agencies.