Friday, May 25, 2012

Microsoft Released Face Tracking SDK in Kinect for Windows

The Microsoft's revolutionary hardware, the Microsoft Kinect, is getting a new piece of brain. Microsoft just released Face Tracking SDK in Kinect For Windows. It can be used for 3D face tracking. It supports most facial types and works in real-time.

You can use the Face Tracking SDK in your program if you install Kinect for Windows Developer Toolkit 1.5. You need to have Kinect camera attached to your PC. The face tracking engine tracks at the speed of 4-8 ms per frame depending on how powerful your PC is.

Take a look at the following demo which shows its facial tracking capabilities, range of supported motions, real-time tracking speed and robustness to occlusions.

Here are several things that will affect tracking accuracy, provided by Nikolai Smolynskiy.

1) Light – a face should be well lit without too many harsh shadows on it. Bright backlight or sidelight may make tracking worse.

2) Distance to the Kinect camera – the closer you are to the camera the better it will track. The tracking quality is best when you are closer than 1.5 meters (4.9 feet) to the camera. At closer range Kinect’s depth data is more precise and so the face tracking engine can compute face 3D points more accurately.

3) Occlusions – if you have thick glasses or Lincoln like beard, you may have issues with the face tracking. This is still an open area for improvement. Face color is NOT an issue.

The Face Tracking SDK is based on the Active Apperance Model (See Wikipedia explanation for AAM). It also utilizes Kinect’s depth data, so it can track faces/heads in 3D. More technical publications You can be found in the following publications:

  • Iain Matthews and Simon Baker, "Active Appearance Models Revisited," International Journal of Computer Vision, Vol. 60, No. 2, November, 2004, pp. 135 - 164. pdf
  • Zhou, M., Liang, L., J. S. & Wang, Y. "AAM based face tracking with temporal matching and face segmentation,"IEEE CVPR, 2010, 701-708. pdf

To download the SDK visit here.

Wednesday, May 23, 2012

Leap

Leap represents an entirely new way to interact with your computers. It’s more accurate than a mouse, as reliable as a keyboard and more sensitive than a touchscreen.  For the first time, you can control a computer in three dimensions with your natural hand and finger movements.

This isn’t a game system that roughly maps your hand movements.  The Leap technology is 200 times more accurate than anything else on the market — at any price point. Just about the size of a flash drive, the Leap can distinguish your individual fingers and track your movements down to a 1/100th of a millimeter.

This is like day one of the mouse.  Except, no one needs an instruction manual for their hands

https://live.leapmotion.com/about/

[via]

Tuesday, May 22, 2012

[New Paper] Dynamic two-stage image retrieval from large multimedia databases

Avi Arampatzis | Konstantinos Zagoris | Savvas A. Chatzichristofis

Information Processing & Management

Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimedia databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary media. We perform retrieval in a two-stage fashion: first rank by a secondary medium, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed.

Full-size image

Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that our dynamic two-stage method can be significantly more effective and robust than similar setups with static thresholds previously proposed. In additional experiments using local feature derivatives in the visual stage instead of global, such as the emerging visual codebook approach, we find that two-stage does not work very well. We attribute the weaker performance of the visual codebook to the enhanced visual diversity produced by the textual stage which diminishes codebook’s advantage over global features. Furthermore, we compare dynamic two-stage retrieval to traditional score-based fusion of results retrieved visually and textually. We find that fusion is also significantly more effective than single-medium baselines. Although, there is no clear winner between two-stage and fusion, the methods exhibit different robustness features; nevertheless, two-stage retrieval provides efficiency benefits over fusion.

http://www.sciencedirect.com/science/article/pii/S0306457312000489

Swarmanoid

Swarmanoid, The Movie receives the AAAI-2011 Best Video Award at the San Francisco annual event!

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm

Nature 485, 372–375 (17 May 2012)

 

Paralysis following spinal cord injury, brainstem stroke, amyotrophic lateral sclerosis and other disorders can disconnect the brain from the body, eliminating the ability to perform volitional movements. A neural interface system could restore mobility and independence for people with paralysis by translating neuronal activity directly into control signals for assistive devices. We have previously shown that people with long-standing tetraplegia can use a neural interface system to move and click a computer cursor and to control physical devices6, 7, 8. Able-bodied monkeys have used a neural interface system to control a robotic arm9, but it is unknown whether people with profound upper extremity paralysis or limb loss could use cortical neuronal ensemble signals to direct useful arm actions. Here we demonstrate the ability of two people with long-standing tetraplegia to use neural interface system-based control of a robotic arm to perform three-dimensional reach and grasp movements. Participants controlled the arm and hand over a broad space without explicit training, using signals decoded from a small, local population of motor cortex (MI) neurons recorded from a 96-channel microelectrode array. One of the study participants, implanted with the sensor 5 years earlier, also used a robotic arm to drink coffee from a bottle. Although robotic reach and grasp actions were not as fast or accurate as those of an able-bodied person, our results demonstrate the feasibility for people with tetraplegia, years after injury to the central nervous system, to recreate useful multidimensional control of complex devices directly from a small sample of neural signals.

http://www.nature.com/nature/journal/v485/n7398/full/nature11076.html

Tuesday, May 15, 2012

Document Recognition and Retrieval XX (2013)

Document Recognition and Retrieval XX (2013), http://www.cs.rit.edu/~drr2013

San Francisco, Feb. 5-7, 2013

Paper Submission Deadline: July 23, 2012 (11:59 PST)

Document Recognition and Retrieval (DRR)is one of the leading international conferences devoted to current research in document analysis, recognition and retrieval. The 20th Document Recognition and Retrieval Conference is being held as part of SPIE Electronic Imaging, from Feb. 5-7, 2013 in San Francisco, California, USA.

One keynote speaker has been confirmed, Ray Smith of Google Research.

Ray will be presenting on the development of the widely used open source Tesseract OCR engine, relating this to changes in document recognition systems since the first DRR was held in 1994.

The Conference Chairs and Program Committee invite all researchers working on document recognition and retrieval to submit original research papers. Papers are presented in oral and poster sessions at the conference, along with invited talks by leading researchers. Accepted papers will be published by the SPIE in the conference proceedings. At the conference a Best Student Paper Award will be presented.

Papers are solicited in, but not limited to, the areas below.

Document Recognition

  • Text recognition:machine-printed, handwritten documents; paper, tablet, camera, and video sources
  • Writer/style identification, verification, and adaptation
  • Graphics recognition:vectorization (e.g. for line-art, maps and technical drawings), signature, logo and graphical symbol recognition, figure, chart and graph recognition, and diagrammatic notations (e.g. music, mathematical notation)
  • Document layout analysis and understanding:document and page region segmentation, form and table recognition, and document understanding through combined modalities (e.g. speech and images)
  • Evaluation:performance metrics, and document degradation models
  • Additional topics:document image filtering, enhancement and compression, document clustering and classification, machine learning (e.g. integration and optimization of recognition modules), historical and degraded document images (e.g. fax), multilingual document recognition, and web page analysis (including wikis and blogs)

Document Retrieval

  • Indexing and Summarization:text documents (messages, blogs, etc.), imaged documents, entity tagging from OCR output, and text categorization
  • Query Languages and Modalities:Content-Based Image Retrieval (CBIR) for documents, keyword spotting, non-textual query-by-example (e.g. tables, figures, math), querying by document geometry and/or logical structure, approximate string matching algorithms for OCR output, retrieval of noisy text documents (messages, blogs, etc.), cross and multi-lingual retrieval
  • Evaluation:relevance and performance metrics, evaluation protocols, and benchmarking
  • Additional topics:relevance feedback, impact of recognition accuracy on retrieval performance, and digital libraries including systems engineering and quality assurance

Important Dates

  • 23 July, 2012: Paper submission deadline
  • Late August, 2012: Author notifications
  • 26 November, 2012: Final paper submission deadline
  • 5-7 February, 2013: Conference dates

*Paper Submission

All paper submissions should be between 8-12 pages in length, using the SPIE LaTeX template (available from conference web pages). For accepted papers, final submissions will also be 8-12 pages in the same format. Papers should clearly identify the problem addressed in the paper, identify the original contribution(s) of the paper, relate the paper to previous work, and provide experimental and/or theoretical evaluation as appropriate. Submissions should be uploaded through the conference web site (http://www.cs.rit.edu/~drr2013/submission.html).