Friday, March 29, 2013

How Well Do You Know Tom Hanks? Using a Game to Learn About Face Recognition

Oge Marques, Justyn Snyder and Mathias Lux

Human face recognition abilities vastly outperform computer-vision algorithms working on comparable tasks, especially in the case of poor lighting, bad image quality, or partially hidden faces. In this paper, we describe a novel game with a purpose in which players must guess the name of a celebrity whose face appears blurred. The game combines a successful casual game
paradigm with meaningful applications in both humanand computer-vision science. Preliminary user studies were conducted with 28 users and more than 7,000 game rounds. The results supported and extended preexisting knowledge and hypotheses from controlled scientific experiments, which show that humans are remarkably good at recognizing famous faces, even
with a significant degree of blurring. Our results will be further incorporated into research in human vision as well as machine-learning and computer-vision algorithms for face recognition.

Wednesday, March 27, 2013

Mobile Visual Search (MVS)

Once more, one year later

An amazing presentation by  (my friend) Oge Marques!!!!

Watch the (full length !!! ) video here:



Mobile Visual Search (MVS) is a fascinating research field with many open challenges and opportunities which have the potential to impact the way we organize, annotate, and retrieve visual data (images and videos) using mobile devices. This talk is structured in four parts:

(i) MVS — opportunities: where I present recent and relevant numbers of the mobile computing market, particularly in the field of photography apps, social networks, and mobile search.

(ii) Basic concepts: where I explain the basic MVS pipeline and discuss the three main MVS scenarios and associated challenges.

(iii) Advanced technical details: where I explain technical aspects of feature extraction, indexing, descriptor matching, and geometric verification, discuss the state of the art in these fields, and comment on open problems and research opportunities.

(iv) Examples and applications: where I show recent and significant examples of academic research (e.g., Stanford Product Search System) and commercial apps (e.g., Google Goggles, oMoby, kooaba) in this field.

Great stuff professor Marques, thanks for sharing

Tuesday, March 26, 2013

Learning to Rank Research using Terrier

Part 1:

Part 2:

In recent years, the information retrieval (IR) field has experienced a paradigm shift in the application of machine learning techniques to achieve effective ranking models. A few years ago,we were using hill-climbing optimisation techniques such as simulated annealing to optimise the parameters in weighting models, such as BM25 or PL2, or latterly BM25F or PL2F. Instead, driven first by commercial search engines, IR is increasingly adopting a feature-based approach, where various mini-hypothesis are represented as numerical features, and learning to rank techniques are deployed to decide their importance in the final ranking formulae.

The typical approach for ranking is described in the following figure from our recently presented WSDM 2013 paper:

Phases of a retrieval system deploying learning to rank, taken from Tonellotto et al, WSDM 2013.

In particular, there are typically three phases:

  1. Top K Retrieval, where a number of top-ranked documents are identified, which is known as the sample.
  2. Feature Extraction - various features are calculated for each of the sample documents.
  3. Learned Model Application - the learned model obtained from a learning to rank technique re-ranks the sample documents to better satisfy the user.
The Sample

The set of top K documents selected within the first retrieval phase is called the sample by Liu, even though the selected documents are not iid. Indeed, in selecting the sample, Liu suggested that the top K documents ranked by a simple weighting model such as BM25 is not the best, but is sufficient for effective learning to rank. However, the size of the sample - i.e. the number of documents to be re-ranked by the learned model - is an important parameter: with less documents, the first pass retrieval can be made more efficient by the use of dynamic pruning strategies (e.g. WAND); on the other hand, too few documents may result in insufficient relevant documents being retrieved, and hence effectiveness being degraded.
Our article The Whens and Hows of Learning to Rank in the Information Retrieval Journal studied the sample size parameter for many topic sets and learning to rank techniques - for the mixed information needs on the TREC ClueWeb09 collection, we found that while a sample size of 20 documents was sufficient for effective performance according to ERR@20, larger sample sizes of thousands of documents were needed for effective NDCG@20; for navigational information needs, predominantly larger samples sizes (upto 5000 documents) were needed; Moreover, the particular document representations that used to identify the sample was shown to have an impact on effectiveness - indeed, navigational queries were found to be considerably easier (requiring smaller samples) when anchor text was used, but for informational queries, the opposite was observed. In the article, we examined these issues in detail, across a number of test collections and learning to rank techniques, as well as investigating the role of the evaluation measure and its rank cutoff for listwise techniques - for in depth details and conclusions, see the IR Journal article.

Read More:

Part 1:

Part 2:

Monday, March 25, 2013

QUALINET Multimedia Databases

Article from

A key for current and future developments in Quality of Experience resides in a rich and internationally recognized database of content of different sorts, and to share such a database with the scientific community at large. The QUALINET Database platform takes the necessary steps to make them accessible to all researchers: (registration is free of charge)



Currently, the QUALINET database comprises 122 multimedia databases, based on literature/Internet search and input from Qualinet partner laboratories. They're mostly image (~52) or video datasets (~68), with (~57) or w/o subjective quality rating, special content, e.g. 3D (~23), FV, eyetracking (~21), audio, audiovisual (~8), HDR, and other modalities.

The documentation of the QUALINET databases can be found on the corresponding Wiki page. For an overview, please consult the white paper on QUALINET databases (PDF) and please reference it as follows:

Karel Fliegel, Christian Timmerer, (eds.), “WG4 Databases White Paper v1.5: QUALINET Multimedia Database enabling QoE Evaluations and Benchmarking”, Prague/Klagenfurt, Czech Republic/Austria, Version 1.5, March 2013.

Finally, you're welcome to contribute to this effort, simply send an email and briefly describe your dataset [to subscribe, send an e-mail (its content is unimportant) to, you will receive information to confirm your subscription, and upon the acceptance of the moderator will be included in the mailing-list].

Additionally, you may consider submitting a dataset paper to QoMEX or MMSys which hosts dataset tracks and accepted dataset paper will be automatically included within the QUALINET database.

Sunday, March 24, 2013

“There’s no bad data, only bad uses of data”

Article from

IN the 1960s, mainframe computers posed a significant technological challenge to common notions of privacy. That’s when the federal government started putting tax returns into those giant machines, and consumer credit bureaus began building databases containing the personal financial information of millions of Americans. Many people feared that the new computerized databanks would be put in the service of an intrusive corporate or government Big Brother.

“It really freaked people out,” says Daniel J. Weitzner, a former senior Internet policy official in the Obama administration. “The people who cared about privacy were every bit as worried as we are now.”

Along with fueling privacy concerns, of course, the mainframes helped prompt the growth and innovation that we have come to associate with the computer age. Today, many experts predict that the next wave will be driven by technologies that fly under the banner of Big Data — data including Web pages, browsing habits, sensor signals, smartphone location trails and genomic information, combined with clever software to make sense of it all.

Proponents of this new technology say it is allowing us to see and measure things as never before — much as the microscope allowed scientists to examine the mysteries of life at the cellular level. Big Data, they say, will open the door to making smarter decisions in every field from business and biology to public health and energy conservation.

“This data is a new asset,” says Alex Pentland, a computational social scientist and director of the Human Dynamics Lab at the M.I.T. “You want it to be liquid and to be used.”

But the latest leaps in data collection are raising new concern about infringements on privacy — an issue so crucial that it could trump all others and upset the Big Data bandwagon. Dr. Pentland is a champion of the Big Data vision and believes the future will be a data-driven society. Yet the surveillance possibilities of the technology, he acknowledges, could leave George Orwell in the dust.

The World Economic Forum published a report late last month that offered one path — one that leans heavily on technology to protect privacy. The report grew out of a series of workshops on privacy held over the last year, sponsored by the forum and attended by government officials and privacy advocates, as well as business executives. The corporate members, more than others, shaped the final document.

The report, “Unlocking the Value of Personal Data: From Collection to Usage,” recommends a major shift in the focus of regulation toward restricting the use of data. Curbs on the use of personal data, combined with new technological options, can give individuals control of their own information, according to the report, while permitting important data assets to flow relatively freely.

“There’s no bad data, only bad uses of data,” says Craig Mundie, a senior adviser at Microsoft, who worked on the position paper.

The report contains echoes of earlier times. The Fair Credit Reporting Act, passed in 1970, was the main response to the mainframe privacy challenge. The law permitted the collection of personal financial information by the credit bureaus, but restricted its use mainly to three areas: credit, insurance and employment.

The forum report suggests a future in which all collected data would be tagged with software code that included an individual’s preferences for how his or her data is used. All uses of data would have to be registered, and there would be penalties for violators. For example, one violation might be a smartphone application that stored more data than is necessary for a registered service like a smartphone game or a restaurant finder.

The corporate members of the forum say they recognize the need to address privacy concerns if useful data is going to keep flowing. George C. Halvorson, chief executive of Kaiser Permanente, the large health care provider, extols the benefits of its growing database on nine million patients, tracking treatments and outcomes to improve care, especially in managing costly chronic and debilitating conditions like heart disease, diabetes and depression. New smartphone applications, he says, promise further gains — for example, a person with a history of depression whose movement patterns slowed sharply would get a check-in call.

“We’re on the cusp of a golden age of medical science and care delivery,” Mr. Halvorson says. “But a privacy backlash could cripple progress.”


Proximity solutions for a new way to interact with mobile devices

Thursday, March 21, 2013

Open positions for scientists and postdocs at QCRI

The Qatar Computing Research Institute (QCRI: is looking to hire 3 scientists/PostDocs over the course of the next 12 months in the Information Retrieval area. QCRI conducts world-class applied computing research, creating knowledge and supporting innovation in select areas of computing science that will have long-term relevance and lasting value for Qatar.

Job Description:

Scientists and PostDocs are expected to contribute towards the research efforts of QCRI and to develop research expertise, tackling IR research challenges. The scientist will work as part of a research team, collaborating with peer researchers and software engineers to publish high quality papers, develop prototypes, and generate intellectual property in the form of disclosures and patent applications.


• Ph.D. in Computer Science or related field with primary focus on information retrieval

• Strong publication record in tier 1 conferences and journals

• Expertise in related areas such as machine learning and/or natural language processing is preferred

• 1-3 years of research experience past Ph.D. for scientists

• Industrial and development experience would be a plus

Typically, we hire fresh Ph.D.’s as PostDocs with the prospect of switching to full time positions within 1 year. We can offer full time positions to truly exceptional fresh Ph.D.’s.


QCRI offers competitive compensation including attractive tax-free salary and additional benefits such as furnished accommodation, annual paid leave, medical insurance, etc.

Wednesday, March 20, 2013

Large image data sets with LIRE – some new numbers

Article from

People lately asked whether LIRE can do more than linear search and I always answered: Yes, it should … but you know I never tried. But: Finally I came around to index the MIR-FLICKR data set and some of my Flickr-crawled photos and ended up with an index of 1,443,613 images. I used CEDD as main feature and a hashing algorithm to put multiple hashes per images into Lucene — to be interpreted as words. By tuning similarity, employing a Boolean query, and adding a re-rank step I ended up with a pretty decent approximate retrieval scheme, which is much faster and does not loose too many images on the way, which means the method has an acceptable recall. The image below shows the numbers along with a sample query. Linear search took more than a minute, while the hashing based approach did (nearly) the same thing in less than a second. Note that this is just a sequential, straight forward approach, so no optimization has been done to the performance. Also the hashing approach has not yet been investigated in detail, i.e. there are some parameters that still need some tuning … but let’s say it’s a step into the right direction.


Monday, March 18, 2013

Eagle-Claw Drone Grabs Objects In Mid-Flight


Now that quadrotor drones have been programed to spy on us, play music and land on wires like birds, it’s time to get down to a little fine tuning.

Up for the task is Justin Thomas and his colleagues at the University of Pennsylvania’s GRASP Lab. Drawing inspiration from the way birds of prey swoop down and seize fish from water, they’ve developed a unmanned aerial vehicle (UAV) with a 3D-printed talon-like gripper that’ll leave you holding onto your hat.

The team improved upon their previous quadrotor designs that only plucked objects in mid-air while the UAV was hovering. By studying an eagle’s fishing technique, the team noticed how the bird would sweep its legs and claws backwards as its talons gripped its prey. This maneuver allowed the eagle to snatch a meal in one dive-bombing swoop without slowing down.

The team was able to copy the eagle’s technique by 3D-printing a three-fingered claw and attaching it to a four-inch motorized leg. By securing the appendage below the UAV’s center of mass, the drone could grasp stationary objects while flying by.

Obviously, the only thing left to do is for GRASP Lab to add a screaming eagle sound effect to the drone.

Article from

Original Article:

Friday, March 15, 2013

OpenVIDIA : Parallel GPU Computer Vision

Cuda VisionWorkbenchFeature Tracking Program


OpenVIDIA projects implement computer vision algorithms running on on graphics hardware such as single or multiple graphics processing units(GPUs) using OpenGL, Cg and CUDA-C. Some samples will soon support OpenCL and Direct Compute API's also.

An active project within OpenVida, CVWB (CUDA Vision Workbench) includes a back-end DLL and front-end Windows application that run common image processing routines in a framework convenient for interactive experimentation. Additional OpenVidia projects for Stereo Vision, Optical Flow and Feature Tracking algorithms are detailed below.

OpenVIDIA projects utilize the computational power of the GPU to provide real--time computer vision and imaging much faster than the CPU is capable of, while offloading the CPU to allow it to conduct concurrent tasks and also using less power.

This project was founded at the Eyetap Personal Imaging Lab (ePi Lab) at the Electrical and Computer Engineering Group at the University of Toronto. It has been expanded to include contributions from many sources in academia and industry.

Thursday, March 14, 2013

Microsoft Kinect Learns to Read Hand Gestures, Minority Report-Style Interface Now Possible

Article from:

Not only is the Microsoft Research Cambridge team finally releasing their 3D modeling API Kinect Fusion, they’re bringing you gesture control—with mouse clicks and multi-touch, pinch-to-zoom interactions.

Current Kinect sensors can track joints in your body, but not small gestures, in particular hand motions. But that's about to change. Using machine learning, Kinect can now recognize open and closed hands. In the video below, Jamie Shotton, the man behind Kinect’s skeletal modeling, shows how users can use their hands to navigate a map or draw using a painting program.

As developers explore this new Kinect API capability (and as Microsoft refines the level of gesture recognition beyond open and closed hands), the possibilities here look pretty exciting. We're hoping, of course, to finally throw away our mouses and use computers like Tom Cruise in Minority Report. The API will be released in the next Windows SDK.

And if you’re waiting to be able to build your own custom Kinect avatar with your hands without building a program, you may not have to wait that long. The Microsoft Beijing research team demoed their prototype Body Avatar, which lets you do just that—whether you want to be a five-legged dragon or a one legged-goose. We'll be posting that video soon. Stay tuned. 

Wednesday, March 13, 2013

ACM MM Open Source Software Competition

The ACM Multimedia Open-Source Software Competition celebrates the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. This year will be the sixth year in running the competition as part of the ACM Multimedia program.

To qualify, software must be provided with source code and licensed in such a manner that it can be used free of charge in academic and research settings. For the competition, the software will be built from the sources. All source code, license, installation instructions and other documentation must be available on a public web page. Dependencies on non-open source third-party software are discouraged (with the exception of operating systems and commonly found commercial packages available free of charge). To encourage more diverse participation, previous years’ non-winning entries are welcome to re-submit for the 2013 competition. Student-led efforts are particularly encouraged.

Authors are highly encouraged to prepare as much documentation as possible, including examples of how the provided software might be used, download statistics or other public usage information, etc. Entries will be peer-reviewed to select entries for inclusion in the conference program as well as an overall winning entry, to be recognized formally at ACM Multimedia 2013. The criteria for judging all submissions include broad applicability and potential impact, novelty, technical depth, demo suitability, and other miscellaneous factors (e.g., maturity, popularity, student-led, no dependence on closed source, etc.).

Authors of the winning entry, and possibly additional selected entries, will be invited to demonstrate their software as part of the conference program. In addition, accepted overview papers will be included in the conference proceedings.

Important Dates

Open Source Software Submission Deadline:
May 13, 2013

Notification of Acceptance:
June 21, 2013

Camera-ready Submission Deadline:
July 30, 2013

Open Source Software Competition Guidelines

Authors interested in submitting an entry to the ACM Multimedia Open Source Software competition should make their software contribution available by providing a public URL for download and prepare a package containing the following information to be submitted via the online submission system under the Open Source Software Competition track:

Title of submission
Names and affiliations of authors (indicate students).
In case of a large distributed project, include full list of contributors, if possible, and indicate main contact (e.g., project owner or maintainer)
A 4-page max paper including an overview of the open-source software package, its description, applications, intended audience, main features, etc.
Submitters are also encouraged to tag their paper with up to two areas from the main track (see Areas for a list of the areas).
A link to a compressed zip file that contains the software.
The license of the open source software.
The permanent link for the open source software (e.g., Sourceforge, Google Code, SVN)

The overview paper should be prepared using the same guidelines as for Short Papers. Authors will be required to sign copyright to the ACM. The sentence “SUBMITTED to ACM MULTIMEDIA 2013 OPEN SOURCE SOFTWARE COMPETITION” written in capital letters must appear in the first page, after the authors’ names.

The compressed zip archive file with all source code, documentation, build/install instructions, and licenses must be placed in a web accessible location. The public URL for project page where software, documentation, and open-source license can be found must be included in the overview paper. Comprehensive and clear build/install instructions will be a crucial component of any submission. The evaluation committee will make reasonable effort to build the software for the top contributions, but if they are unable to make the software run, it will be excluded from the competition.

Link to the Online Submission System (to appear soon)


For any questions regarding tutorials please email the Open Source Competition Chairs:

Ioannis (Yiannis) Patras (Queen Mary University, UK) i.patras -at-
Andrea Vedaldi (Oxford University, UK) vedaldi -at-

Updates on LIRE (SVN rev 39)

Article from
Author:Mathias Lux

LIRE is not a sleeping beauty, so there’s something going on in the SVN. I recently checked in updates on Lucene (now 4.2) and Commons Math (now 3.1.1). Also I removed some deprecation things still left from Lucene 3.x.

Most notable addition however is the Extractor / Indexor class pair. They are command line applications that allow to extract global image features from images, put them into an intermediate data file and then — with the help of Indexor — write them to an index. All images are referenced relatively to the intermediate data file, so this approach can be used to preprocess a whole lot of images from different computers on a network file system. Extractor also uses a file list of images as input (one image per line) and can be therefore easily run in parallel. Just split your global file list to n smaller, non overlapping ones and run n Extractor instances. As the extraction part is the slow one, this should allow for a significant speed-up if used in parallel.

Extractor is run with

$> Extractor -i <infile> -o <outfile> -c <configfile>

  • <infile> gives the images, one per line. Use “dir /s /b *.jpg > list.txt” to create a compatible list on Windows.

  • <outfile> gives the location and name of the intermediate data file. Note: It has to be in a folder parent to all images!

  • <configfile> gives the list of features as a Java Properties file. The supported features are listed below the post. The properties file looks like:

Indexor is run with

Indexor -i <input-file> -l <index-directory>

  • <input-file> is the output file of Extractor, the intermediate data file.

  • <index-directory> is the directory of the index the images will be added (appended, not overwritten)

Features supported by Extractor:

  • net.semanticmetadata.lire.imageanalysis.CEDD

  • net.semanticmetadata.lire.imageanalysis.FCTH

  • net.semanticmetadata.lire.imageanalysis.OpponentHistogram

  • net.semanticmetadata.lire.imageanalysis.JointHistogram

  • net.semanticmetadata.lire.imageanalysis.AutoColorCorrelogram

  • net.semanticmetadata.lire.imageanalysis.ColorLayout

  • net.semanticmetadata.lire.imageanalysis.EdgeHistogram

  • net.semanticmetadata.lire.imageanalysis.Gabor

  • net.semanticmetadata.lire.imageanalysis.JCD

  • net.semanticmetadata.lire.imageanalysis.JpegCoefficientHistogram

  • net.semanticmetadata.lire.imageanalysis.ScalableColor

  • net.semanticmetadata.lire.imageanalysis.SimpleColorHistogram

  • net.semanticmetadata.lire.imageanalysis.Tamura

Monday, March 11, 2013

M.I.T. Computer Program Reveals Invisible Motion in Video

A team of scientists at the Massachusetts Institute of Technology has developed a computer program that reveals colors and motions in video that are otherwise invisible to the naked eye.

Read More

Sunday, March 10, 2013

Visual Information Retrieval using Java and LIRE

Synthesis Lectures on Information Concepts, Retrieval, and Services
January 2013, 112 pages, (doi:10.2200/S00468ED1V01Y201301ICR025)
Mathias Lux (Alpen Adria Universität Klagenfurt, AT)
Oge Marques (Florida Atlantic University, USA)

Abstract. Visual information retrieval (VIR) is an active and vibrant research area, which attempts at providing means for organizing, indexing, annotating, and retrieving visual information (images and videos) from large, unstructured repositories.

The goal of VIR is to retrieve matches ranked by their relevance to a given query, which is often expressed as an example image and/or a series of keywords. During its early years (1995-2000), the research efforts were dominated by content-based approaches contributed primarily by the image and video processing community. During the past decade, it was widely recognized that the challenges imposed by the lack of coincidence between an image’s visual contents and its semantic interpretation, also known as semantic gap, required a clever use of textual metadata (in addition to information extracted from the image’s pixel contents) to make image and video retrieval solutions efficient and effective. The need to bridge (or at least narrow) the semantic gap has been one of the driving forces behind current VIR research. Additionally, other related research problems and market opportunities have started to emerge, offering a broad range of exciting problems for computer scientists and engineers to work on.

In this introductory book, we focus on a subset of VIR problems where the media consists of images, and the indexing and retrieval methods are based on the pixel contents of those images — an approach known as content-based image retrieval (CBIR). We present an implementation-oriented overview of CBIR concepts, techniques, algorithms, and figures of merit. Most chapters are supported by examples written in Java, using Lucene (an open-source Java-based indexing and search implementation) and LIRE (Lucene Image REtrieval), an open-source Java-based library for CBIR.

Saturday, March 9, 2013

Automatic Summarization and Annotation of Videos with Lack of Metadata Information

My latest accepted article - Expert Systems with Applications (In Press)

Authors: Dim P. Papadopoulos, Vicky S. Kalogeiton, Savvas A. Chatzichristofis, Nikos Papamarkos

Untitled - 2

The advances in computer and network infrastructure together with the fast evolution of multimedia data has resulted in the growth of attention to the digital video’s development. The scientific community has increased the amount of research into new technologies, with a view to improve the digital video utilization: its archiving, indexing, accessibility, acquisition, store and even its process and usability. All these parts of the video utilization entail the necessity of the extraction of all important information of a video, especially in cases of lack of metadata information. The main goal of this paper is the construction of a system that automatically generates and provides all the essential information, both in visual and textual form, of a video. By using the visual or the textual information, a user is facilitated on the one hand to locate a specific video and on the other hand is able to comprehend rapidly the basic points and generally, the main concept of a video without the need to watch the whole of it. The visual information of the system emanates from a video summarization method, while the textual one derives from a key-word-based video annotation approach. The video annotation technique is based on the key-frames, that constitute the video abstract and therefore, the first part of the system consists of the new video summarization method. According to the proposed video abstraction technique, initially, each frame of the video is described by the Compact Composite Descriptors (CCDs) and a visual word histogram. Afterwards, the proposed approach utilizes the Self-Growing and Self-Organized Neural Gas (SGONG) network, with a view to classifying the frames into clusters. The extraction of a representative key frame from every cluster leads to the generation of the video abstract. The most significant advantage of the video summarization approach is its ability to calculate dynamically the appropriate number of final clusters. In the sequel, a new video annotation method is applied on the generated video summary leading to the automatic generation of key-words capable of describing the semantic content of the given video. This approach is based on the recently proposed N-closest Photos Model (NCP). Experimental results on several videos are presented not only to evaluate the proposed system but also to indicate its effectiveness.