Sunday, July 31, 2011

MPEG news: a report from the 97th meeting, Torino, Italy

Original Article: by Christian Timmerer

The 97th MPEG meeting in Torino brought a few interesting news which I'd like to report here briefly. Of course, as usual, there will be an official press release to be published within the next weeks. However, I'd like to report on some interesting topics as follows:

  • MPEG Unified Speech and Audio Coding (USAC) reached FDIS status
  • Call for Proposals: Compact Descriptors for Visual Search (CDVS) !!!
  • Call for Proposals: Internet Video Coding (IVC)

MPEG Unified Speech and Audio Coding (USAC) reached FDIS status

ISO/IEC 23003-3 aka Unified Speech and Audio Coding (USAC) reached FDIS status and soon will be an International Standard. The FDIS itself won't be publicly available but the Unified Speech and Audio Coding Verification Test Report in September 2011 (most likely here). 

Call for Proposals: Compact Descriptors for Visual Search (CDVS)

I reported previously about that and here comes the final CfP including the evaluation framework.
MPEG is planning standardizing technologies that will enable efficient and interoperable design of visual search applications. In particular we are seeking technologies for visual content matching in images or video. Visual content matching includes matching of views of objects, landmarks, and printed documents that is robust to partial occlusions as well as changes in vantage point, camera parameters, and lighting conditions.

There are a number of component technologies that are useful for visual search, including format of visual descriptors, descriptor extraction process, as well as indexing, and matching algorithms. As a minimum, the format of descriptors as well as parts of their extraction process should be defined to ensure interoperability.

It is envisioned that a standard for compact descriptors will:

  • ensure interoperability of visual search applications and databases, 
  • enable high level of performance of implementations conformant to the standard,
  • simplify design of descriptor extraction and matching for visual search applications, 
  • enable hardware support for descriptor extraction and matching in mobile devices,
  • reduce load on wireless networks carrying visual search-related information.

It is envisioned that such standard will provide a complementary tool to the suite of existing MPEG standards, such as MPEG-7 Visual Descriptors. To build full visual search application this standard may be used jointly with other existing standards, such as MPEG Query Format, HTTP, XML, JPEG, JPSec, and JPSearch.

The Call for Proposals and the Evaluation Framework is publicly available. From a research perspective, it would be interesting to see how technologies submitted as an answer to the CfP compete with existing approaches and applications/services.
Call for Proposals: Internet Video Coding (IVC)

I reported previously about that and the final CfP for Internet Video Coding Technologies will be available around August 5th, 2011. However, you may have a look at the requirements already which can reveal some interesting issues the call will be about:

  • Real-time communications, video chat, video conferencing,
  • Mobile streaming, broadcast and communications,
  • Mobile devices and Internet connected embedded devices 
  • Internet broadcast streaming, downloads
  • Content sharing.

Requirements fall into the following major categories:

  • IPR requirements
  • Technical requirements
  • Implementation complexity requirements

Clearly, this work item has an optimization towards IPR but others are not excluded. In particular,

It is anticipated that any patent declaration associated with the Baseline Profile of this standard will indicate that the patent owner is prepared to grant a free of charge license to an unrestricted number of applicants on a worldwide, non-discriminatory basis and under other reasonable terms and conditions to make, use, and sell implementations of the Baseline Profile of this standard in accordance with the ITU-T/ITU-R/ISO/IEC Common Patent Policy.

Further information you may find at the MPEG Web site, specifically under the hot newssection and the press release. Working documents of any MPEG standard so far can be found here. If you want to join any of these activities, the list of Ad-hoc Groups (AhG) is available here (soon also here) including the information how to join their reflectors.

Original Article: by Christian Timmerer

Wednesday, July 27, 2011

ImageCLEF's Wikipedia Retrieval Task Results

ImageCLEF is the cross-language image retrieval track which is run as part of the Cross Language Evaluation Forum (CLEF) campaign. The ImageCLEF retrieval benchmark was established in 2003 with the aim of evaluating image retrieval from multilingual document collections. Images by their very nature are language independent, but often they are accompanied by texts semantically related to the image (e.g. textual captions or metadata). Images can then be retrieved using primitive features based on pixels with form the contents of an image (e.g. using a visual exemplar), abstracted features expressed through text or a combination of both. The language used to express the associated texts or textual queries should not a ect retrieval, i.e. an image with a caption written in English should be searchable in languages other than English.

ImageCLEF's Wikipedia Retrieval task provides a testbed for the systemoriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate retrieval approaches in the context of a large and heterogeneous collection of images (similar to those encountered on the Web) that are searched for by users with diverse information needs.
In 2011, ImageCLEF's Wikipedia Retrieval used a collection of over 237000 Wikipedia images that cover diverse topics of interest. These images are associated with unstructured and noisy textual annotations in English, French, and German.

Overall, based on the best run per group, as illustrated in the following Table, we are 4th out of 11 in MAP:


Based on the best run per group, as illustrated in the following Table, we are 4nd out of 11 in P@10:


JOB: postdoc in animation and graphics at Edinburgh

Postdoctoral position in Robotics and Animation. The successful candidate is expected to have (or be near completion) a PhD in the area of computer animation and graphics and the candidate is expected to have strong mathematical skills in the area of optimization and control theory, and good programming skills. It is also preferable to have experience working in the area of machine learning or robotics and have some familiarity of concepts such as character animation, 3D physical simulation and machine learning techniques. The post is funded by an EPSRC project that focuses on developing state-of-the-art motion generation systems for robotic manipulation under uncertainty using concepts of topology space and data driven dimensionality reduction. The appointee will be responsible for direct implementation of character / humanoid control in physical simulators (Open Dynamics Engine, PhysX), simulation of deformable objects including strands and cloth, and control of robots such as the Kuka LWR and Nao Humanoids.

The post is available from 1st September 2011 until 28th February 2014

Contact Taku Komura <tkomura <at> ed <dot> ac <dot> uk> for more details.

Sunday, July 24, 2011

Google acquires PittPatt

Original Article:

Computer vision startup Pittsburgh Pattern Recognition, known as PittPatt, has replaced its web page with an announcement that it has been acquired by Google. The amount is not disclosed. According to PittPatt's statement, it "will continue to tap the potential of computer vision in applications that range from simple photo organization to complex video and mobile applications." PittPatt's showcase application is face detection and recognition. PittPatt is a Carnegie-Mellon University (CMU) spin-off.


The announcement:

Joining Google is the next thrilling step in a journey that began with research at Carnegie Mellon University's Robotics Institute in the 1990s and continued with the launching of Pittsburgh Pattern Recognition (PittPatt) in 2004. We've worked hard to advance the research and technology in many important ways and have seen our technology come to life in some very interesting products. At Google, computer vision technology is already at the core of many existing products (such as Image Search, YouTube, Picasa, and Goggles), so it's a natural fit to join Google and bring the benefits of our research and technology to a wider audience. We will continue to tap the potential of computer vision in applications that range from simple photo organization to complex video and mobile applications.

We look forward to joining the team at Google!

The team at Pittsburgh Pattern Recognition

Original Article:

2 computer vision PhD places at Edinburgh

Applications are invited for two fully funded PhD students to work in the School of Informatics on the following topics:

* knowledge transfer to automate learning visual models

* learning visual object categories from consumer and advertisement videos

* leveraging the structure of natural sentences to aid visual learning

Applicants must have:

* Master degree (preferably in Computer Science or Mathematics)

* Excellent programming skills; the projects involve programming in Matlab and C++

* Solid knowledge of Mathematics (especially algebra and statistics)

* Highly motivated

* Fluent in English, both written and spoken

* UK or EU nationality

* Experience in computer vision and/or machine learning is a plus (ideally a master thesis in a related field)

The School of Informatics at Edinburgh is one of the top-ranked departments of Computer Science in Europe and offers an exciting research environment. Edinburgh is a beautiful historic city with a high quality of life.

Starting date: January 2012 or later

The PhD work will be carried out under the supervision of Vittorio Ferrari. He is currently with ETH Zurich. He will move to the University of Edinburgh in December 2011 and build a new research group in Computer Vision. For an overview of his current research activities, please visit

For pre-screening, please send applications to the email address below, including:

* complete CV

* title and abstract of your master thesis

* complete grades for all exams passed during both the bachelor and master (to obtain this position you need high grades, especially in mathematics and programming disciplines)

* the name and email address of one reference (preferably your master thesis supervisor)

* if you already have research experience, please include a publication list.

Monday, July 18, 2011

Do we need an alternative to peer-reviewed journals?

Original Post

The past week has seen rather lively discussion about the scientific publishing industry and peer review. Peter Murray-Rust has produced a series of posts about his issues with the process (start here then work your way forward), Joe Pickrell described his problems with peer-reviewed journals at Genomes Unzipped, and Stuart Lyman has a letter to the editor in Nature Biotechnology (subscription required). (It's also a topic that we've considered in the past.) As Wired's Dan MacArthur put it, "it is a source of constant wonder to me that so many scientists have come to regard a system [the existing publication process] that actively inhibits the rapid, free exchange of scientific information as an indispensable component of the scientific process." So what's the problem, and what should (or can) we do about it?

Who gets to read the science?

Until the relatively recent advent of open-access publishing, readers have been expected to foot the costs of the publishing process. A year of a single journal can cost a library six-figure sums. Noninstitutional users can expect to be charged around $30 for a single article, as can academic users whose library doesn't subscribe to the Journal of Obscure Factoids. If you've spent your life in well-funded research institutes, this might not seem like an issue. But, for those at smaller schools or from less-affluent countries, this can be a substantial barrier to being able to participate in the exchange and dissemination of scientific ideas. These paywalls also stand between taxpayers and the research they've supported.

As Stuart Lyman's letter to Nature Biotech points out, the price of access has also become a problem for private sector research. The large pharmaceutical companies that used to have well-stocked libraries have downsized or shut them down as part of their relentless cost-cutting. For small or even medium-size companies, the costs of institutional subscriptions quickly adds up.

The access issue is the one that's seen the most progress, with the creation of open-access journals where the publication costs are met by the authors, not the readers (authors had been paying fees to publish in some journals anyway). The effort to make publicly funded discoveries publicly available has also been gaining ground. From 2008 onwards, recipients of NIH funding have been subject to NIH's Public Access Policy, which requires that any publications that arise from its funds appear in either open-access journals or be placed in PubMed Central within 12 months. Similar policies have been implemented by other national funding bodies and private foundations, as well as individual institutions.

Some publishers have made attempts to have Congress overturn NIH's policy. It's an understandable move; for-profit publishers fear for their bottom line, while other journals are published by scientific societies, many of which depend on subscription revenues for basic operations. But this revenue model may have been dying on its own. The bulk of scientific literature is consumed electronically rather than in hard copy. Sure, it used to be that disseminating new scientific ideas involved printing lots of copies and shipping them, but how true is that in 2011?

Paying for peer review

Part of the price of a journal goes to cover the process of peer review, which has also been the subject of criticism. It costs both time and money, and weeks or months can pass between submitting a paper and having it accepted. Reviewers have to be found, and they are expected to spend hours doing a thorough job without compensation.

Despite all this effort, there are worries that the process doesn't work any better than chance. A common criticism is that peer review is biased towards well-established research groups and the scientific status quo. Reviewers are unwilling to reject papers from big names in their fields out of fear, and they can be hostile to ideas that challenge their own, even if the supporting data is good. Unscrupulous reviewers can reject papers and then quickly publishing similar work themselves.

Alternatives to the current system have been examined, but MIT didn't think much of their experiment with open peer review, and Nature's testing of these waters didn't really pan out either. Nature did find overwhelming support from authors, who felt that open peer review improved their papers; a more recent study from the Publishing Research Consortium that we reported on found similar things. These comments indicate that abandoning peer review entirely isn't a viable solution. To some extent, it has already happened with the arXiv, which is filled with all sorts of crap that makes The Daily Mail's science pages seem reputable.

Beyond arranging for peer review, journals act as gatekeepers—they screen submissions for interest or importance as well as just the veracity of the work. That, too, has provoked a response. PLoS ONE attempts to take the first part out of the equation:

Too often a journal's decision to publish a paper is dominated by what the Editor/s think is interesting and will gain greater readership—both of which are subjective judgments and lead to decisions which are frustrating and delay the publication of your work. PLoS ONE will rigorously peer-review your submissions and publish all papers that are judged to be technically sound. Judgments about the importance of any particular paper are then made after publication by the readership (who are the most qualified to determine what is of interest to them).

The problem with this (from where I'm sitting) is that the sheer volume of publications is already almost impossible to manage, making a degree of selectivity valuable. There's always going to be a place for highly selective publishing outlets for work deemed "important"—that's just human nature.

But highly selective journals feed into a final problem area: metrics, impact, and tracking. For better or worse (and I think there's a very strong case for it being worse), academic career progression and research funding are explicitly tied to where a scientist publishes their work. This is done through the use of impact factors, which we've written about extensively (and critically) in the past.

They're a very imperfect measure. Journals that publish reviews as well as research articles can increase their impact factor, and publishing retractions or corrections does so as well. We have the tools to do a better job now, thanks to the move online. There have been experiments with algorithms like PageRank, and one could easily see something that works like Facebook's "like" or Google's "+1" being used. But as a researcher's funding success and promotion remain tied to their publications, what's to stop them from gaming the system? (I envision researchers organizing teams of undergrads to +1 their bibliography.)

Taking a more holistic view towards an individual's career would certainly solve this problem, and it's a solution I'm all in favor of. Until that happens though, I don't think we're going to see things change much.

Although publishing will remain critical, it's hard to escape the sense that it's increasingly trailing behind the scientific community. Twitter, FriendFeed, Mendeley, and now Google+ have become venues where serious discussion about scientific work takes place. We're already seeing friction at some conferences; not everyone is happy having their talk livetweeted, and the backchannel can be cruel to speakers at times. But social media isn't going anywhere, and neither is academic blogging.

Recognizing the legitimacy of these will be a critical challenge for academia, but it might happen organically as a younger cohort replaces the boomers currently running the show. Fixing peer review is something that shouldn't wait, though. Unfortunately, it's easier to say than to do; I don't have any ready suggestions for how.

Lightsaber + Kinect + robotic arm = JediBot

Original Post

By combining a dexterous robotic arm, the movement tracking capabilities of Microsoft’s Kinect sensor, and some clever software, students at Stanford University have created what can only be called a JediBot. The arm is equipped with a bright red foam-dampened lightsaber, but for all intents and purposes it is trained to kill the opponent: a student with a green lightsaber (shouldn’t it be blue?)

Basically, the robot arm is pre-programmed with a bunch of “attack moves” and it defends by using the Kinect to track the green lightsaber. To attack, JediBot performs a random attack move, and if it meets resistance — another lightsaber, a skull, some ribs — it recoils and performs another, seemingly random, attack. It can attack once every two to three seconds — so it isn’t exactly punishing, but presumably it would only require a little knob-tweaking to make it a truly killer robot.


To defend, the JediBot uses the Kinect sensor to pick the green lightsaber out of the background (that’s why it isn’t blue), and performs depth analysis to work out where it is in comparison to the robot’s lightsaber. If you watch the video, the tracking is remarkably fast, and it’s probably very hard to actually land a blow on the robot. A video of the JediBot is embedded below.

The JediBot was created during Stanford University’s three-and-a-half-week Experimental Robotics course. Other students on the course used similar robotic arms to draw, take photos, play golf, and even flip burgers (and add ketchup!) A video of these other robotic applications is embedded below the JediBot video.

After last month’s story about the robot that can debone a pork ham, and last week’s revelation that IBM Watson might soon be taking over the role of salespeople and technical support personnel, you have to wonder whether we humans will perform any physically arduous or taxing tasks in the future.

Original Post

Saturday, July 16, 2011

What Caricatures Can Teach Us About Facial Recognition

Original Article:

Court Jones caricature of author Ben Austen

Wired asked four top caricaturists to sketch the writer of this story. The results are shown here and throughout the story. To read about how writer Ben Austen reacted to the images, see the end of the story.
Photo: Joshua Anderson; caricature: Court Jones

Our brains are incredibly agile machines, and it,s hard to think of anything they do more efficiently than recognize faces. Just hours after birth, the eyes of newborns are drawn to facelike patterns. An adult brain knows it’s seeing a face within 100 milliseconds, and it takes just over a second to realize that two different pictures of a face, even if they’re lit or rotated in very different ways, belong to the same person. Neuroscientists now believe that there may be a specific region of the brain, on the fusiform gyrus of the temporal lobe, dedicated to facial recognition.

Perhaps the most vivid illustration of our gift for recognition is the magic of caricature—the fact that the sparest cartoon of a familiar face, even a single line dashed off in two seconds, can be identified by our brains in an instant. It’s often said that a good caricature looks more like a person than the person himself. As it happens, this notion, counterintuitive though it may sound, is actually supported by research. In the field of vision science, there’s even a term for this seeming paradox—the caricature effect—a phrase that hints at how our brains misperceive faces as much as perceive them.

Human faces are all built pretty much the same: two eyes above a nose that’s above a mouth, the features varying from person to person generally by mere millimeters. So what our brains look for, according to vision scientists, are the outlying features—those characteristics that deviate most from the ideal face we carry around in our heads, the running average of every visage we’ve ever seen. We code each new face we encounter not in absolute terms but in the several ways it differs markedly from the mean. In other words, to beat what vision scientists call the homogeneity problem, we accentuate what’s most important for recognition and largely ignore what isn’t. Our perception fixates on the upturned nose, rendering it more porcine, the sunken eyes or the fleshy cheeks, making them loom larger. To better identify and remember people, we turn them into caricatures.

Ten years ago, the science of facial recognition—until then a somewhat esoteric backwater of artificial-intelligence research—suddenly became a matter of national security. The hazy closed-circuit images of Mohamed Atta, taped breezing through an airport checkpoint in Portland, Maine, enraged Americans and galvanized policymakers to fund research into automated recognition systems. We all imagined that within a few years, as soon as surveillance cameras had been equipped with the appropriate software, each face in a crowd would stand out like a thumbprint, its unique features and configuration offering a biometric key that could be immediately checked against any database of suspects.

For his Hirschfeld Project, to be started this year, Sinha's lab will analyze hundreds of caricatures by dozens of different artists, in order to isolate the facial proportions that are most important for recognition.

Pawan Sinha, director of the Sinha Laboratory for Vision Research at MIT, thinks caricature is the key to better computer vision. For his Hirschfeld Project, to be started this year, Sinha's lab will analyze hundreds of caricatures by dozens of different artists, in order to isolate the facial proportions that are most important for recognition. This chart shows some of the myriad measurements that might prove crucial, like distance from pupil to pupil, distance from bottom lip to chin, or the area of the forehead.

But now a decade has passed, and face-recognition systems still perform miserably in real-world conditions. It’s true that in our digital photo libraries, and now on Facebook, pictures of the same person can be automatically tagged and collated with some accuracy. Indeed, in a recent test of face-recognition software sponsored by the National Institute of Standards and Technology, the best algorithms could identify faces more accurately than humans do—at least in controlled settings, in which the subjects look directly at a high-resolution camera, with no big smiles or other displays of feature-altering emotion. To crack the problem of real-time recognition, however, computers would have to recognize faces as they actually appear on video: at varying distances, in bad lighting, and in an ever-changing array of expressions and perspectives. Human eyes can easily compensate for these conditions, but our algorithms remain flummoxed.

Given current technology, the prospects for picking out future Mohamed Attas in a crowd are hardly brighter than they were on 9/11. In 2007, recognition programs tested by the German federal police couldn’t identify eight of 10 suspects. Just this February, a couple that accidentally swapped passports at the airport in Manchester, England, sailed through electronic gates that were supposed to match their faces to file photos.

All this leads science to a funny question. What if, to secure our airports and national landmarks, we need to learn more about caricature? After all, it’s the skill of the caricaturist—the uncanny ability to quickly distill faces down to their most salient features—that our computers most desperately need to acquire. Better cameras and faster computers won’t be enough. To pick terrorists out of a crowd, our bots might need to go to art school—or at least spend some time at the local amusement park.

In the 19th century, law enforcement knew that exaggerated art could catch crooks. When New York’s Boss Tweed, on the lam in Spain, was finally arrested in 1876, he was identified not with the aid of a police sketch but with a Thomas Nast caricature from Harper’s Weekly. Today, though, most police departments use automated facial-likeness generators, which tend to create a bland, average face rather than a recognizable portrait of the guilty party. Paul Wright, the president of Identi-Kit, one of the most commonly used composite systems in the US, concedes that the main value of his product is in ruling out a large fraction of the population. “Half the people might say a composite sketch looks like Rodney Dangerfield, another half like Bill Clinton. But it’s not useless. It doesn’t look like Jack Nicholson.”

Visit the annual convention of the International Society of Caricature Artists and you’ll find people who describe their face-depiction skills in far less modest terms. Take Stephen Silver, who began his career 20 years ago as a caricaturist at Sea World and is now a character designer for TV animation studios. “If they used caricatures for police composites today,” Silver says, “people would be like, ‘What is this, a joke?’ But the cops would catch the guy. If I drew a caricature, the guy would be shit out of luck.”

Daniel Almariei caricature of author Ben Austen

Daniel Almariei's caricature of author Ben Austen
Photo: Joshua Anderson; caricature: Daniel Almariei

Silver is one of 188 artists from 13 different countries who attended the most recent ISCA gathering, in Las Vegas. Over five days, and sometimes late nights, these artists draw one another’s faces over and over again, often in orgiastic clusters, the artist-subject pairings shifting repeatedly and assuming every conceivable angle. The caricatures produced are eventually displayed and voted on by the attendees, with the first-place winner awarded a Golden Nosey trophy. Silver won the prize in 2000, and it’s easy to see why. As he scans a room, he can size up faces and get the drop on each at a glance.

“I don’t care how many wrinkles there are around the eye or if there’s stubble,” he says. “Those features aren’t going to help me. You know who a person is from basic shapes.” He spies a red-haired woman across the room, takes aim at her head. “Do you see how its meat is all on the outside?” he asks. “With the features crammed into the center?” Next his sights shift to an African-American woman drawing busily at a foldout table. Her head is actually tiny, Silver points out, but the span from her bottom lip to the base of her neck is immense.

Read More at:

Wednesday, July 13, 2011

New Version of img(Rummager)


Several Bugs Fixed

1. CCDs Late Fusion bug fixed

2. Mpeg 7 Late Fusion bug fixed

3. Color Histograms problem fixed (thanks to Oge Marques)

4. AutoCorrelogram problem fixed (thanks to Oge Marques)

5. Search from local folder problem fixed



New Features

6. Visual Words Search now supports several weight and normalization methods (8 new methods - using SMART)

7. Color Visual Words (CoViWo) – Alpha version

8. New methods for custom sized codebooks

9. New method for dynamically sized codebooks

10. New “Create index files” methods

11. SURF as well as CoViWo features are now saved in binary files (for faster retrieval)

12. XML files are now more compact

13. Batch mode is even faster

14. Select the number of the results for the TrecFiles (in batch mode) (for example - don't include the entire database but only the first 1000 results)

Please note that there is known bug with local features (SURF), Visual Words and Color Visual Words (CoViWo). If an image does NOT contain any points of interest, img(Rummager) crashes. This bug is because of EmguCV (probably the same problem appears when using OpenCV).




In this paper, a new visually based query expansion method for image retrieval from distributed web search engines is proposed. The innovation of this method lies in the fact that the effectiveness of image retrieval is improved because it is based on recursive query recommendation. Until now, the problem of retrieving images, very close to the seeker’s desires from web search engines, is growing as well as the continuous increase of confusing image information over the Internet. The method, described in this paper, proposes a web image search engine, called TsoKaDo, which at first, given a user’s query, parses images from the three most known search engines, e.g. Google, Bing and Ask. Then, it classifies them in C automatically computed classes using Content Based techniques and more specific the Color and Edge Directivity Descriptor (CEDD). Consequently, the most representative image of each class is compared, using CEDD again, with the results parsed from Flickr, searched with the same keyword. Finally the tags of the top-K images of Flickr are classified based on their semantic distance, and are proposed to the user in order to expand his query for better results.

Please note that this web application is an undergraduate project!!!!!!


Another project from the Duth Robotics Team

The paper is currently under review. I ‘ll post more details ASAP!!!

Lazaros T. Tsochatzidis, Athanasios C. Kapoutsis, Nikolaos I. Dourvas, Savvas A. Chatzichristofis, Yiannis S. Boutalis, “QUERY EXPANSION BASED ON VISUAL IMAGE CONTENT”, «5th Panhellenic Scientific Conference for Undergraduate and Postgraduate Students in Computer Engineering, Informatics, related Technologies and Applications», September 30 to October 1, 2011, Kastoria, Greece, Submitted for Publication


If You’re Happy and You Know It - Facebook will detect it

Facedotcom-featureDoes the internet know when you’re smiling? That’s a rhetorical question. Of course it, makers of a top notch facial recognition API, recently announced it was now capable of detecting the moods and expressions of people in photos it scans. Now, not only can the API tell who you are, it can say whether you were happy, sad, smiling, or even kissing. is the creator of the popular PhotoFinder andPhotoTagger apps on Facebook, so you may soon see that capability on the social network as well as among the 20,000 developers who use the API. In related news, Facebook (using its own software) has been automatically using facial recognition to tag photos you upload since December of last year. They’ve already prompted the use of such facial scanning 2.7 billion times in the past six months! Learn more about their push for automated tagging in the video below. Facial recognition has grown so sophisticated, and cheap, that it seems it will soon leave no photo untagged, no mood unrecorded. If that idea makes you uneasy, don’t worry, the social network of the future knows exactly how you feel.

Facebook quietly rolled out their in-house facial recognition (Photo-Tag Suggest) in the US last December, allowing users to tag their friends, teaching the social network who was who. Soon there after, Facebook could automatically suggest who was in each picture, making tagging quick and easy. It’s a pretty awesome feature, and as of early June it was available “in most countries.”On June 30th, Facebook announced it had prompted its 750 million active users 2.7 billion times to try the automated tagging process, often with the rather ambiguous box on your homepage labeled “Photos are better with friends.” Naturally some privacy activists groups are crying foul, worrying that although the Photo Tag Suggest only works on your friends Facebook is collecting huge amounts of data on our appearance. ABC News has more:

Sunday, July 10, 2011

Multimodal Image Retrieval: ImageCLEF and beyond

Speaker: Dr. Jayashree Kalpathy-Cramer, OHSU


The images historically used for compression research (lena, barbra, pepper etc...) have outlived their useful life and its about time they become a part of history only. They are too small, come from data sources too old and are available in only 8-bit precision.

These high-resolution high-precision images have been carefully selected to aid in image compression research and algorithm evaluation. These are photographic images chosen to come from a wide variety of sources and each one picked to stress different aspects of algorithms. Images are available in 8-bit, 16-bit and 16-bit linear variations, RGB and gray.


RGB 8 bit (330 MB), RGB 16 bit (787 MB), RGB 16 bit linear (626 MB)
Gray 8 bit (103 MB), Gray 16 bit (285 MB), Gray 16 bit linear (207 MB)

You are encouraged to use these images for image compression research and algorithm evaluation. Suggestions for further improvements are always welcome.

Preview: click on the images to enlarge, you can enlarge multiple images at same time. You can also checkout results of various lossless and lossy compression algorithms on these images.

Computer generated using 3D modeling and ray-tracing.

Hasselblad H3D II-39

Leaf Aptus 65

Fuji Provia 100, film
(Not available in linear sets)

Olympus E-330


Leaf Cantare


These Images are available without any prohibitive copyright restrictions.

These images are (c) there respective owners. You are granted full redistribution and publication rights on these images provided:
1. The origin of the pictures must not be misrepresented; you must not claim that you took the original pictures. If you use, publish or redistribute them, an acknowledgment would be appreciated but is not required.
2. Altered versions must be plainly marked as such, and must not be misinterpreted as being the originals.
3. No payment is required for distribution of this material, it must be available freely under the conditions stated here. That is, it is prohibited to sell the material.
4. This notice may not be removed or altered from any distribution.


A lot of people contributed a lot of time and effort in making this test set possible. Thanks to everyone who shared their opinion in any of the discussions online or by email. Thanks to Axel Becker, Thomas Richter and Niels Fröhling for their extensive help in picking images, running all the various tests etc... Thanks to Pete Fraser, Tony Story, Wayne J. Cosshall, David Coffin, Bruce Lindbloom, and for the images which make up this set.

Thursday, July 7, 2011

Google Similar Images: I think I'll let the screenshot speak for itself–Part 2

1. WOW- My girlfriend is a JEDI!!!!!



2. Do I look like HER?????


send me your Google Similar stories at

Google Similar Images: I think I'll let the screenshot speak for itself



More details about Google Visually Similar Images Soon.


The 18th International MultiMedia Modeling Conference (MMM2012)
January 4-6, 2012 - Klagenfurt University, Klagenfurt, Austria.

The International MultiMedia Modeling (MMM) Conference is a leading international conference ( for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all MMM related areas.
The conference calls for research papers reporting original investigation results and demonstrations in, but not limited to, the following areas related to multimedia modeling technologies and applications:

1. Multimedia Content Analysis
1.1 Multimedia Indexing
1.2 Multimedia Mining
1.3 Multimedia Abstraction and Summarization
1.4 Multimedia Annotation, Tagging and Recommendation
1.5 Multimodal Analysis for Retrieval Applications
1.6 Semantic Analysis of Multimedia and Contextual Data
1.7 Multimedia Fusion Methods
1.8 Media Content Browsing and Retrieval Tools

2. Multimedia Signal Processing and Communications
2.1 Media Representation and Algorithms
2.2 Audio, Image, Video Processing, Coding and Compression
2.3 Multimedia Security and Content Protection
2.4 Multimedia Standards and Related Issues
2.5 Advances in Multimedia Networking and Streaming
2.6 Multimedia Databases, Content Delivery and Transport
2.7 Wireless and Mobile Multimedia Networking

3. Multimedia Applications and Services
3.1 Multi-Camera and Multi-View Systems
3.2 Virtual Reality and Virtual Environment
3.3 Real-Time and Interactive Multimedia Applications
3.4 Mobile Multimedia Applications
3.5 Multimedia Web Applications
3.6 Interactive Multimedia Authoring Personalization
3.7 Sensor Networks (Video Surveillance, Distributed Systems)
3.8 Emerging Trends (e-learning, e-Health, Social Media, Multimedia Collaboration, etc.)

Paper Submission Guidelines:
Papers should be no more than 10-12 pages in length (demo papers 3 pages), conforming to the formatting instructions of Springer Verlag, LNCS series ( Papers will be judged by an international program committee based on their originality, significance, correctness and clarity.
All papers should be submitted electronically in PDF format through the EasyChair submission system ( The review process is single-blind, therefore please do not conceal authors’ identities from reviewers. To publish the paper in the conference, one of the authors needs to register and present the paper in the conference.

Important Dates:
Submission of full papers: July 22, 2011
Notification of acceptance: September 19, 2011
Camera-ready papers due: October 10, 2011
Author registration: October 10, 2011
Conference: January 4-6, 2012

For more information, please visit

PROMISE Winter School 2012 - Information Retrieval meets Information Visualization

Zinal, Valais - Switzerland
23 - 27 January 2012


The aim of the Promise Winter School on information retrieval and information visualization is to give participants a grounding in the core topics that constitute the multidisciplinary area of Multilingual Information Retrieval. The school is a week-long event consisting of guest lectures from invited speakers who are recognized experts in the field. The School is intended for  PhD students, Masters students or senior researchers such as post-doctoral researchers form the fields of information visualization and information retrieval and related fields.

General Chair
  • Tiziana Catarci, Sapienza, University of Rome, Italy
Program Commitee
  • Maristella Agosti, University of Padua, Italy
  • Nicola Ferro, University of Padua, Italy
  • Henning Müller, University of Applied Sciences Western Switzerland, Switzerland
  • Guiseppe Santucci, Sapienza, University of Rome, Italy
Publicity Chair
  • Pamela Forner, Centre for the Evaluation of Language and Communication Technologies (CELCT), Italy
  • Hélène Mazo, Evaluations and Language resources Distribution Agency (ELDA), France
Local Organization
  • Alexandre Cotting, University of Applied Sciences Western Switzerland, Switzerland
  • Henning Müller, University of Applied Sciences Western Switzerland, Switzerland
  • For any information please contact  winter-school[at]

Wednesday, July 6, 2011

SiftGPU: A GPU Implementation of Scale Invariant Feature Transform (SIFT) - Changchang Wu

SIFT Implementation

SiftGPU is an implementation of SIFT [1] for GPU. SiftGPU processes pixels parallely to build Gaussian pyramids and detect DoG Keypoints. Based on GPU list generation[3], SiftGPU then uses a GPU/CPU mixed method to efficiently build compact keypoint lists. Finally keypoints are processed parallely to get their orientations and descriptors.

SiftGPU is inspired by Andrea Vedaldi's sift++[2] and Sudipta N Sinha et al's GPU-SIFT[4] . Many parameters of sift++ ( for example, number of octaves, number of DOG levels, edge threshold, etc) are also available in SiftGPU. The shader programs are dynamically generated according to the parameters that user specified.

SiftGPU also includes a GPU exhaustive/guided sift matcher SiftMatchGPU. It basically multiplies the descriptor matrix on GPU and find closest feature matches on GPU. Both GLSL and CUDA implementations are provided.


SiftGPU requires a high-end GPU (like nVidia 8800) that has a large graphic memory and supports dynamic branching. GLSL for OpenGL is used by default, and CUDA is provided as an alternative for nVidia graphic cards. Haven't fully tested on ATI, but the GLSL shaders did pass the AMD Shader Analyzer (Catalyst 8.12), it should be working.

SiftGPU uses GLEW 1.51, DevIL1.77 (can be disabled), GLUT(viewer only), and CUDA(optional). You'll need to make sure that your system has all the dependening libraries of corresponding versions. To update the libaries, you'll need to replace the header files in SiftGPU\Include\, and the corresponding binaries.

NOTE FOR CUDA : 1. The thread block setting is currently tuned on nVidia GTX 8800. It may not be optimized for other GPUs. 2. The CUDA version is not compiled by default. You need to define CUDA_SIFTGPU_ENABLED to the compiler and recompile the package. For VS2010 users, you can just use SiftGPU_CUDA_Enabled solution.


SiftGPU-V371 (5.0MB; Including code, manual , windows binary and some test images) Want to cite SiftGPU?
You might be interested in the Matlab Versions mex'd by Adam Chapman and by Parag. K. Mital

SimpleSIFT.cpp gives some examples of using SiftGPU and SiftMatchGPU.
Previous versions of SiftGPU can be found through this link. A complete change list can be found here.

Some minor updates since V360
  1. Converted the MSVC Solution to Visual Studio V2010 and tested CUDA4 (6/2011)
  2. Automatic switching from OpenGL to CUDA when OpenGL is not supported (1/2011)
  3. Dropped the indirect data transfer path CPU->GL->CUDA (1/2011)
  4. Dropped the CG implementation to simplify maintance (1/2011)
Some previous changes
  6. Added device selection for Multi-threading (Check the example at MultiThreadSIFT.cpp).
  5. Used SSE to speedup the descriptor normalization step for the OpenGL implementation.
  4. Added CUDA-based SiftGPU/SiftMatchGPU implementation. See Figure below for the speed.
  3. Added OpenGL-based sift matching implementation, check example #7 in manual. (Thanks to Zach)
  2. Added function to compute descriptors for user-specified keypoints, check example #6 in manual.
  1. Improved speed by %50 compared with V293. Look here for experiment details and explanations


Below is the evaluation of the speed of V340 on different image sizes. "-fo -1" means using upsampled image. "-glsl" uses GLSL and "-cuda" uses CUDA (The experiment images are all resized from this image) .

   System : nVidia 8800GTX, 768MB, Driver 182.08, Windows XP, Intel 3G P4 CPU, 3.5G RAM. (V311 Speed)

Below is the comparision with Lowe's SIFT on box.pgm using the comparision code from Vedaldi's SIFT .


[1]   D. G. Lowe. Distinctive image features from scale-invariant keypoints . International Journal of Computer Vision, November 2004.
[2]   A. Vedaldi. sift++,
[3]   G. Ziegler, et al. GPU point list generation through histogram pyramids. In Technical Report, June 2006.
[4]   Sudipta N Sinha, Jan-Michael Frahm, Marc Pollefeys and Yakup Genc, "GPU-Based Video Feature Tracking and Matching ",
        EDGE 2006, workshop on Edge Computing Using New Commodity Architectures, Chapel Hill, May 2006

Tuesday, July 5, 2011

4th International Conference on Agents and Artificial Intelligence

ICAART (4th International Conference on Agents and Artificial Intelligence - has an open call for papers, whose deadline is set for July 28, 2011. We hope you can participate in this conference by submitting a paper reflecting your current research in any of the following areas:

- Artificial Intelligence

- Agents

The conference will be sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC) and held in cooperation with the Portuguese Association for Artificial Intelligence (APPIA), the Spanish Association for Artificial Intelligence (AEPIA) and the Association for the Advancement of Artificial Intelligence (AAAI). INSTICC is member of the Foundation for Intelligent Physical Agents (FIPA), Workflow Management Coalition (WfMC) and the Object Management Group (OMG).

ICAART would like to become a major point of contact between researchers, engineers and practitioners interested in the theory and applications in these areas. BBCQ Informatics applications are pervasive in many areas of Artificial Intelligence and Distributed AI, including Agents and Multi-Agent Systems. This conference intends to emphasize this connection, therefore, authors are invited to highlight the benefits of Information Technology (IT) in these areas. Ideas on how to solve problems using agents and artificial intelligence, both in R&D and industrial applications, are welcome. Papers describing advanced prototypes, systems, tools and techniques and general survey papers indicating future directions are also encouraged.

The conference program features a number of Keynote Lectures to be delivered by distinguished world-class researchers, including those listed below.

All accepted papers (full, short and posters) will be published in the conference proceedings, under an ISBN reference, on paper and on CD-ROM support.

A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer-Verlag in a CCIS Series book.

A short list of papers will be selected for publication in a special issue of JOPHA - Journal of Physical Agents.

The proceedings will be submitted for indexation by Thomson Reuters Conference Proceedings Citation Index (ISI), INSPEC, DBLP and Elsevier Index (EI).

Best paper awards will be distributed during the conference closing session. Please check the website for further information (

All papers presented at the conference venue will be available at the SciTePress Digital Library ( SciTePress is member of CrossRef (

We also would like to highlight the possibility to submit to the following Satellite Workshop:

- 2nd International Workshop on Semantic Interoperability (IWSI)

Workshops, Special sessions as well as tutorials dedicated to other technical/scientific topics are also envisaged: companies interested in presenting their products/methodologies or researchers interested in holding a tutorial are invited to contact the conference secretariat. Workshop chairs and Special Session chairs will benefit from logistics support and other types of support, including secretariat and financial support, to facilitate the development of a valid idea.

Please check further details at the ICAART conference website (

Should you have any question please don't hesitate contacting me.

ICAART 2012 will be held in conjunction with ICPRAM ( and ICORES 2012 ( in Vilamoura, Algarve, Portugal. Registration to ICAART will enable free access to the ICPRAM and ICORES conferences (as a non-speaker).

Monday, July 4, 2011 Lets You Pay on Mobile by Holding a Credit Card Up to the Phone, a new San Francisco-based startup led by two former AdMob employees, Mike Mettler and Josh Bleecher, is introducing a revolutionary idea that could transform mobile commerce: make it easier to pay. But how the company has accomplished this is a feat that will feel more like magic to the everyday mobile user. With, you simply hold your credit card up to the phone. The software then "sees" the card information using the phone's camera and the payment is processed. No typing required!

To get the technology into the hands of those who need it most, is targeting iOS developers at launch, specifically those in the e-commerce, local, ticketing, travel and daily deals space.

Card Scanning is Very Accurate, Says Company

The company has been in existence for only nine months, six of which were spent building'score technology, the computer vision and machine learning algorithms which it uses to read the numbers of a credit card. Unlike several other check-scanning and business card scanning software programs, doesn't use humans to verify the accuracy of scans - everything is programmatic. Of course this means that the scans themselves have to be highly accurate, and Mettler says they are.

However, the company wouldn't provide exact percentages here, only that the "vast majority" of scans should be accurate. But that's where the machine learning aspect comes into play. The more data that's processed, the better the software performs because it learns and improves over time.

Cardio flow iphone

Once the credit card information is entered, the developer can then continue to process the payment using their own merchant account as usual. Although will be looking for merchant partners going forward, that's not the service it's offering today. Also of note, all the data handles is secured using 128 bit-SSL and the service never stores card images on the phone or on the company's servers.


Going After E-Commerce, Not Point-of-Sale

Card info cardio

Using technology to input the credit card details into a mobile device may seem like a step backwards when positioned alongside other upcoming mobile advances like NFC, a wireless technology that lets you pay for real-world goods with just a tap.

But NFC requires a special chip in handsets, and currently, few phones on the market offer this. It's still years out from mainstream adoption. Meanwhile, everyone has credit cards and these plastic, physical cards won't disappear anytime soon. Most importantly, is not trying to compete with NFC or other innovations at the point-of-sale - it's going after the e-commerce market.

With a new software development kit (SDK) available now, the iOS (iPhone, iPad and iPod Touch) developers accepted into the company's private beta can integrate the technology into their own app. Initially, as noted above, is most interested in developers working in the e-commerce, local, ticketing, travel and daily deals space. It is already working with a few companies here, including MogoTix for event tickets, TaskRabbit for local services and SamaSource for donations. received $1 million in seed funding in January, led by former eBay exec Michael Dearing of Harrison Metal. Other investors include Jeff Clavier and Charles Hudson of SoftTech VC, Manu Kumar of K9 Ventures, Alok Bhanot (former VP, Risk Technology at PayPal), and Omar Hamoui (CEO/founder of AdMob).

iOS developers interested in joining the private beta have to plead their case here.

DeepShot: Computer vision app for computer-smartphone communication

This post by Computer Vision Central has been reprinted from Computer Vision Central's blog

Google Research and MIT CSAIL have developed a method for transferring an application's work state between a computer and a mobile phone using computer vision. The user can take an image of the computer screen using the mobile phone camera; the Deep Shotapp on the phone then identifies the current webpage on the computer and communicates with the computer to determine the URI of the application. For many web-based applications, the URI specifies both the application and the application state. Deep Shot also supports transfer in the other direction: an image of the computer display taken with the mobile phone is used to identify the computer to the phone. Deep Shot requires installation of software on both the computer and the phone. More information is available on Popular Science.

Sunday, July 3, 2011


4-9 SEPTEMBER 2011, Plymouth, UK

Registration deadline: 11th July, 2011

It is a pleasure to announce the Call for Participation to the 7th International Summer School on Pattern Recognition. I write to invite you, your colleagues, and students within your department to attend this event.

In 2010, the 6th ISSPR School held at Plymouth was a major success with over 90 participants. The major focus of 2011 summer school includes:

- A broad coverage of pattern recognition areas which will be taught in a tutorial style over five days by leading experts. The areas covered include statistical pattern recognition, Bayesian techniques, non-parametric and neural network approaches including Kernel methods, String matching, Evolutionary computation, Classifiers, Decision trees, Feature selection and Dimensionality reduction, Clustering, Reinforcement learning, and Markov models. For more details visit the event website.

- A number of prizes sponsored by Microsoft and Springer for best research demonstrated by participants and judged by a panel of experts. The prizes will be presented to the winners by Prof. Chris Bishop from Microsoft Research.

- Providing participants with knowledge and recommendations on how to develop and use pattern recognition tools for a broad range of applications.

10 Corporate Scholarships are available towards discounted registration fee for students till 11th July, 2011 so this is an excellent opportunity for participants to register at an affordable cost. The fee includes registration and accommodation plus meals at the event. The registration process is online through the school website which has further details on registration fees. Please note that the number of participants registering each year at the summer school is high with a limited number of seats available, and therefore early registration is highly recommended.