AI Applications in the fields of Multimedia, Computer Vision and Robotics: August 2012

Friday, August 24, 2012

Eye Tribe

Αναρτήθηκε από Savvas Chatzichristofis

The Eye Tribe creates software that allows users to interact with their mobile device just by looking at it. Activate the screen, scroll websites automatically as you read and play games using your eyes!

The image captured by the front-facing camera of the smartphone or tablet is analyzed using computer-vision algorithms. Eye Tribe software can determine the location of the eyes and estimate where you're looking on the screen with an accuracy good enough to know which icon you're looking at.

The version showcased in the videos uses the "EyeDock", an add-on with an inexpensive webcam and an infrared LED. These are standard components and can be integrated into a mobile device using the front-facing camera and a built-in LED.

The first device with the software is expected to hit the market in 2013.

Thursday, August 16, 2012

Photo Search by Face Positions and Facial Attributes on Touch Devices

Αναρτήθηκε από Savvas Chatzichristofis

With the explosive growth of camera devices, people can freely take photos to capture moments of life, especially the ones accompanied with friends and family. Therefore, a better solution to organize the increasing number of personal or group photos is highly required.

In this project, we propose a novel way to search for face images according facial attributes and face similarity of the target persons. To better match the face layout in mind, our system allows the user to graphically specify the face positions and sizes on a query â€œcanvas,â€ where each attribute or identity is defined as an â€œiconâ€ for easier representation.

Moreover, we provide aesthetics filtering to enhance visual experience by removing candidates of poor photographic qualities. The scenario has been realized on a touch device with an intuitive user interface. With the proposed block-based indexing approach, we can achieve near real-time retrieval (0.1 second on average) in a large-scale dataset (more than 200k faces in Flickr images).

Example queries and top 5 retrieval results from our image search system.

http://www.csie.ntu.edu.tw/~winston/projects/face/

Tuesday, August 14, 2012

MediaMill - Semantic Video Search Engine

Αναρτήθηκε από Savvas Chatzichristofis

The MediaMill semantic video search engine is bridging the gap between research and applications. It integrates the state-of-the-art techniques developed at the Intelligent Systems Lab Amsterdam of the University of Amsterdam and applies it to realistic problems in video retrieval.

The techniques employed in MediaMill originate from various disciplines such as image and video processing, computer vision, language technology, machine learning and information visualization. To ensure state-of-the-art competitiveness, MediaMill participates in the yearly TRECVID benchmark.

MediaMill has its roots in the ICES-KIS Multimedia Information Analysis project (in conjunction with TNO) and the Innovative Research Program for Image processing (IOP). It blossomed in the BSIK program MultimediaN and the EU FP-6 program VIDI-Video. MediaMill currently plays an important role in the Dutch VENI SEARCHER project, the Dutch/Flemish IM-Pact BeeldCanon project, the Dutch FESCOMMIT program, and the US IARPA SESAME project.

http://www.science.uva.nl/research/mediamill/index.php

Thursday, August 2, 2012

ImageTerrier: An extensible platform for scalable high-performance image retrieval

Αναρτήθηκε από Savvas Chatzichristofis

ImageTerrier is a novel easily extensible open-source, scalable, high-performance search engine platform for content-based image retrieval applications. The platform provides a comprehensive test-bed for experimenting with bag-of-visual-words image retrieval techniques. It incorporates a state-of-the-art implementation of the single-pass indexing technique for constructing inverted indexes and is capable of producing highly compressed index data structures. ImageTerrier is written as an extension to the open-source Terrier, "Terabyte Retriever", test-bed platform for textual information retrieval research. The ImageTerrier platform is demonstrated to successfully index and search a corpus of over 10 million images containing just under 10,000,000,000 quantised SIFT visual terms.

http://dl.acm.org/citation.cfm?id=2324844&bnc=1

MIT offers a new programming language for the visual web

Αναρτήθηκε από Savvas Chatzichristofis

Article from http://gigaom.com/2012/08/01/mit-offers-a-new-programming-language-for-the-visual-web/

MIT released Halide, a programming language that makes it easier to process photos without resorting to slow, custom algorithms. Halide might be the software equivalent of a sewing machine for sites such as Instagram that previously had to stitch their imaging processing code by hand.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have built a new programming language called Halide that it hopes will make writing image processing software easier. The resulting language is built specifically for working with images in constrained compute environments and would replace custom algorithms currently written to perform those image-processing functions.

If so, that’s a good thing, given our love of visually rich web sites and our predilection for snapping and sharing mobile photos as easily as we once made voice calls. Not only will image-processing software be easier to write, but Halide might also help spare our mobile batteries by using our processors more efficiently.

The challenge of computer sight

Getting a digital camera to “see” like a human’s eye is not an easy task. Heck, exactly how the human eye “sees” isn’t an easily understood thing. Fortunately for us, our brains handle all the complexities associated with compensating for lighting, discerning color from different wavelengths and all pulling it all together into something meaningful to a human being. But for cameras on mobile phones or pictures sent to the web for editing, image processing is the result of many different steps — all of which take a lot of processing power.

The MIT folks explain it like this in their release on Halide:

One reason that image processing is so computationally intensive is that it generally requires a succession of discrete operations. After light strikes the sensor in a cellphone camera, the phone combs through the image data for values that indicate malfunctioning sensor pixels and corrects them. Then it correlates the readings from pixels sensitive to different colors to deduce the actual colors of image regions. Then it does some color correction, and then some contrast adjustment, to make the image colors better correspond to what the human eye sees. At this point, the phone has done so much processing that it takes another pass through the data to clean it up.
And that’s just to display the image on the phone screen.

The problem is getting bigger and the features are getting richer

The problem is that our many-megapixel cameras are gathering in more information and that takes a lot longer for the relatively weak processors on a mobile phone to turn into an image — never mind editing it for red-eye correction or balancing the light. Hence the need for fancy algorithms that can help divvy up that processing among multiple cores present in desktops and phones. But as the bits in our photos bloat, so do those algorithms, becoming longer, more complex and device dependent.

That’s what Halide aims to solve. Those algorithms are still useful but instead of making the algorithm worry about how to divide up the job amongst the available processors, Halide splits the job into a scheduler that worries about what where to send the data and leaves the algorithm to worry about the actual processing. This means the programmer can now adjust to different machines by adjusting the scheduler (after all, that’s the part that cares about how many cores are in the processor) and she can also describe new features in the scheduler and let it implement them in the algorithm.

By rewriting some common image-processing algorithms in Halide, researchers were able to make image processing two or three times faster — or even six-fold, while also making the written code about a third shorter. The MIT release notes that in one instance, the Halide program was actually longer than the original — but the speedup was 70-fold.

The code, which was developed by Jonathan Ragan-Kelley, a graduate student in the Department of Electrical Engineering and Computer Science, and Andrew Adams, a CSAIL postdoc, can be found online here.

Article from http://gigaom.com/2012/08/01/mit-offers-a-new-programming-language-for-the-visual-web/

The LIRE (Lucene Image REtrieval) library provides a simple way to retrieve images and photos based on their color and texture characteristics. LIRE creates a Lucene index of image features for content based image retrieval (CBIR). Three of the available image features are taken from the MPEG-7 Standard: ScalableColor, ColorLayout and EdgeHistogram a fourth one, the Auto Color Correlogram has been implemented based on recent research results. Furthermore simple methods for searching the index and result browsing are provided by LIRE. The LIRE library and the LIRE Demo application as well as all the source are available under the Gnu GPL license.

Pages

Friday, August 24, 2012

Eye Tribe

Thursday, August 16, 2012

Photo Search by Face Positions and Facial Attributes on Touch Devices

Tuesday, August 14, 2012

MediaMill - Semantic Video Search Engine

Thursday, August 2, 2012

ImageTerrier: An extensible platform for scalable high-performance image retrieval

MIT offers a new programming language for the visual web

The problem is getting bigger and the features are getting richer