Wednesday, December 15, 2010

Wishing You A Merry Christmas..

from the Non-Relevant information retrieval unit and the Image Processing and Retrieval Trends blog

Two papers at ECIR2011 accepted (one oral, one poster)

1.Fusion vs Two-Stage for Multimodal Retrieval

Abstract. We compare two methods for retrieval from multimodal collections. The first is a score-based fusion of results, retrieved visually and textually. The second is a two-stage method that visually re-ranks the top-K results textually retrieved. We discuss their underlying hypotheses and practical limitations, and contact a comparative evaluation on a standardized snapshot of Wikipedia. Both methods are found to be significantly more effective than single-modality baselines, with no clear winner but with different robustness features. Nevertheless,
two-stage retrieval provides efficiency benefits over fusion.

2.Dynamic Two-Stage Image Retrieval from Large Multimodal Databases

Abstract. Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimodal databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary modalities. We perform retrieval in a two-stage fashion: first rank by a secondary modality, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that such dynamic two-stage setups can be significantly more effective and robust than similar setups with static thresholds previously proposed.

Friday, December 10, 2010

Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Read the original article

iCanDraw is the first application that uses Sketch Recognition to assist the user in learning how to draw. Although most of the algorithms and techniques of this paper are not new there is a major contribution in opening a new field of application for sketch recognition. They show sketch recognition can have great use in this kind of applications. The results going through 2 iterations of the application reveal that such application is feasible, and although much more studies have to be done to prove this is an efficient teaching tool, the end-to-end system is now available to begin such studies. Another important result of the paper is the set of design principles obtained from the user study in this kind of applications for assisted drawing using free sketch.
For the implementation of the application the user interface is remarkably well achieved. After a first iteration and a deep analysis of it, many mistakes or weaknesses were detected and corrected such that the final version of this interface is very user oriented and can give a more much effective teaching experience. Each face template goes through a face recognizer to extract its most prominent features, and then some hand corrections are done to finally get to a template of the ideal face sketch. The recognition then is mostly template matching oriented. Some gesture recognition is also used as part of the interface for actions such as erasing or undoing.



The work presented opens a very interesting field of application to sketch recognition. In the sketch recognition class a project about how to draw an eye is one of the possible descendants of this project.  I think one of the mayor challenges in this field is to determine the appropriate amount and quality of the feedback given to the user. If the user is forced to draw too close to the template the experience can be frustrating, but if it is too loose the improvement in drawing might be poor, a solution might be having several difficulty level in different lessons.

Thursday, December 9, 2010

My Defense Online

Watch live streaming video from prostreams at
THURSDAY, DECEMBER 9, 2010 - 10:00 (Greece local time)

Monday, December 6, 2010

1st ACM International Conference on *Multimedia Retrieval* - Call for Multimedia Retrieval Demonstrators

The First ACM International Conference on Multimedia Retrieval (ICMR), puts together the long-lasting experience of former ACM CIVR and ACM MIR. It is the ideal forum to present and encounter the most recent developments and applications in the area of multimedia content retrieval. Originally set up to illuminate the state-of-the-art in image and video retrieval, ICMR aims at becoming the world reference event in this exciting field of research, where researchers and practitioners can exchange knowledge and ideas.

ICMR 2011 is accepting proposals for technical demonstrators that will be showcased during the conference. The session will include demonstrations of latest innovations by research and engineering groups in industry, academia and government. ICMR 2011 is seeking original high quality submissions addressing innovative research in the broad field of multimedia retrieval. Demonstrations can be related to any of the topics defined by ICMR as shown in the call for papers. The technical demonstration showcase will run concurrently with regular ICMR sessions in the poster area.

Papers must be formatted according to the ACM conference style. Papers must not exceed 2 pages in 9 point font and must be submitted as pdf files. While submitting the paper, please make sure that the names and affiliations of the authors are included in the document. Either the Microsoft Word or LaTeX format are accepted. The paper templates can be downloaded directly from the ACM website at

All demo submissions will be peer-reviewed to ensure maximum quality and accepted demo papers will be included in the conference proceedings. The best demo will be awarded and announced during the Social Event.

April 17-20 2011, Trento, Italy


TOP-SURF is an image descriptor that combines interest points with visual words, resulting in a high performance yet compact descriptor that is designed with a wide range of content-based image retrieval applications in mind. TOP-SURF offers the flexibility to vary descriptor size and supports very fast image matching. In addition to the source code for the visual word extraction and comparisons, we also provide a high level API and very large pre-computed codebooks targeting web image content for both research and teaching purposes.

Bart Thomee, Erwin M. Bakker and Michael S. Lew

The TOP-SURF descriptor is completely open source, although the libraries it depends on use different licenses. As the original SURF descriptor is closed source, we used the open source alternative called OpenSURF, which is released under the GNU GPL version 3 license. OpenSURF itself is dependent on OpenCV that is released under the BSD license. Furthermore we used FLANN for approximate nearest neighbor matching, which is also released under the BSD license. To represent images we used CxImage, which is released under the zlib license. Our own code is licensed under the GNU GLP version 3 license, and also under the Creative Commons Attribution version 3 license. The latter license simply asks you give us credit whenever you use our library. All the aforementioned open source licenses are compatible with each other.

Visualization of the visual words Visualization of the visual words Visualization of the visual words

Figure 1. Visualizing the visual words.

Comparing descriptors Comparing descriptors Comparing descriptors Comparing descriptors

Figure 2. Comparing the descriptors of several images using cosine normalized difference, which ranges between 0 (identical) and 1 (completely different). The first image is the original image, the second is the original image but significantly changed in saturation, the third image is the original image but framed with black borders and the fourth image is a completely different one. Using a dictionary of 10,000 words, the distance between the first and second images is 0.42, the distance between the first and the third is 0.64 and the distance between the first and the fourth is 0.98. We have noticed that a (seemingly high) threshold of around 0.80 appears to be able to separate the near-duplicates from the non-duplicates, although this value requires more validation.

Saturday, December 4, 2010


6-8 July 2011, Corfu Island, Greece

The 2011 17th International Conference on Digital Signal Processing (DSP2011), the longest in existence Conference in the area of DSP (, organized in cooperation with the IEEE and EURASIP, will be held in July 6-8, 2011 on the island of Corfu, Greece.

DSP2011 addresses the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals by means of digital devices or techniques. The term "signal" includes audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and other signals.

The program will include presentations of novel research theories / applications / results in lecture, poster and plenary sessions. A significant number of Special Sessions organised by internationally recognised experts in the area, will be held (

Topics of interest include, but are not limited to:

- Adaptive Signal Processing

- Array Signal Processing

- Audio / Speech / Music Processing & Coding

- Biomedical Signal and Image Processing

- Digital and Multirate Signal Processing

- Digital Watermarking and Data Hiding

- Geophysical / Radar / Sonar Signal Processing

- Image and Multidimensional Signal Processing

- Image/Video Content Analysis

- Image/Video Indexing, Search and Retrieval

- Image/Video Processing Techniques

- Image/Video Compression & Coding Standards

- Information Forensics and Security

- Multidimensional Filters and Transforms

- Multiresolution Signal Processing

- Nonlinear Signals and Systems

- Real-Time Signal/Image/Video Processing

- Signal and System Modelling

- Signal Processing for Smart Sensor & Systems

- Signal Processing for Telecommunications

- Social Signal Processing and Affective Computing

- Statistical Signal Processing

- Theory and Applications of Transforms

- Time-Frequency Analysis and Representation

- VLSI Architectures & Implementations

Prospective authors are invited to electronically submit the full camera-ready paper ( The paper must not exceed six (6) pages, including figures, tables and references, and should be written in English. Faxed submissions are not acceptable.

Papers should follow the double-column IEEE format. Authors should indicate one or more of the above listed categories that best describe the topic of the paper, as well as their preference (if any) regarding lecture or poster sessions. Lecture and poster sessions will be treated equally in terms of the review process. The program committee will make every effort to satisfy these preferences. Submitted papers will be reviewed by at least two referees and all accepted papers will be published in the Conference Proceedings (CD ROM) and will be available in IEEExplore digital library. In addition to the technical program, a social program will be offered to the participants and their companions.

It will provide an opportunity to meet colleagues and friends against a backdrop of outstanding natural beauty and rich cultural heritage in one of the best known international tourist destinations.

Wednesday, December 1, 2010

Image Processing in Windows Presentation Foundation (.NET Framework 3.0)

Many image processing researchers which they develop on .NET/C# they still stuck in GDI+. That means, they still using the unsafe bracket and by pointers they get the image information. This has changed since the introduction of the Windows Presentation Foundation (WPF) for .NET Framework 3.0 and beyond. In this blog post, we will discuss how to open and process an image using the WPF tools. Let's consider that the image.png is a 32bits/pixel color image. To access the image pixels:

PngBitmapDecoder myImage = new PngBitmapDecoder(new Uri("image.png"), BitmapCreateOptions.DelayCreation, BitmapCacheOption.OnLoad);
byte[] myImageBytes = new byte [myImage.Frames[0].PixelWidth * 4 * myImage.Frames[0].PixelHeight];
myImage.Frames[0].CopyPixels(myImageBytes, myImage.Frames[0].PixelWidth * 4, 0);

At first line, a image object is created by using the pngbitmapdecoder class. The myImage.Frames collection holds the image information. In this example, an image is opened so the collection is equal to 1 and the picture is accessed by the myImage.Frames[0].

Then a byte array is created which it will hold the image pixels information. The CopyPixels function of the Frames[0] is used to get the pixels of the opened image. In this particular example because the image format is Bgra32, the array byte size is equal to 4*Width*Height. In general, the ordering of bytes in the bitmap corresponds to the ordering of the letters B, G, R, and A in the property name. So in Bgra32 the order is Blue, Green, Red and Alpha. Generally, the image format for the image is accessed by myImage.Frames[0].Format and the palette (if the image is using one) by myImage.Frames[0].Palette.

To manipulate the image pixels in order to create a greyscale image:

int Width = myImage.Frames[0].PixelWidth;
int Height = myImage.Frames[0].PixelHeight;
for (int x = 0; x < Width; x++)
for (int y = 0; y < Height; y++)
int r = myImageBytes[4 * x + y * (4 * Width) + 2];
int g = myImageBytes[4 * x + y * (4 * Width) + 1];
int b = myImageBytes[4 * x + y * (4 * Width) + 0];
int greyvalue = (int)(0.3 * r + 0.59 * g + 0.11 * b);
myImageBytes[4 * x + y * (4 * Width) + 2] = (byte)greyvalue;
myImageBytes[4 * x + y * (4 * Width) + 1] = (byte)greyvalue;
myImageBytes[4 * x + y * (4 * Width) + 0] = (byte)greyvalue;

Finally, to create a new image object from the byte array and save it:

BitmapSource myNewImage = BitmapSource.Create(Width, Height, 96, 96, PixelFormats.Bgra32, null, myImageBytes, 4 * Width);
BmpBitmapEncoder enc = new
FileStream fs = newFileStream("newimage.png", FileMode.Create);

A BitmapSource object is created with the corresponding Width, Height and PixelFormat. Then a bmp encoder object is created to save the image as bmp format. A frame is added and the encoder save the image to the corresponding stream.

These programming tools for image manipulation are only a subset of those that they are available in WPF. For example is the WriteableBitmap Class which it is not an immutable object, suitable for dynamic images.

Dr Konstantinos Zagoris (email:, personal web site: received the Diploma in Electrical and Computer Engineering in 2003 from Democritus University of Thrace, Greece and his phD from the same univercity in 2010. His research interests include document image retrieval, color image processing and analysis, document analysis, pattern recognition, databases and operating systems. He is a member of the Technical Chamber of Greece.