Wednesday, December 26, 2012

Seasons Greetings to everyone!!!!!

Sunday, December 16, 2012

LuminAR bulb lights path to augmented reality

(—Are we moving closer to a computer age where "touchscreen" is in the room, but it is the counter, desktop, wall, our new digital work areas? Are we moving into a new form factor called Anywhere? Do we understand how locked up we are in on-screen prisons, and that options will come? The drive for options is strong at the MIT Media Lab, where its Fluid Interfaces Group has been working on some AR options, the "Augmented Product Counter" and the "LuminAR." The latter is a bulb that makes any surface a touchscreen. You can even use it to replace the bulb in a desk lamp with the MIT group's "bulb" to project images onto a surface. The LuminAR bulb is small enough to fit a standard light fixture.

LuminAR bulb lights path to augmented reality   (w/ video)

The LuminAR team, Natan Linder, Pattie Maes and Rony Kubat, described what they have done as redefining the traditional incandescent bulb and desk lamp as a new category of "robotic, digital information devices." This will be one of the new looks in AR interfaces. The LuminAR lamp system looks similar to a conventional desk lamp, but its arm is a robotic arm with four degrees of freedom. The arm terminates in a lampshade with Edison socket. Each DOF has a motor, positional and torque sensors, motor control and power circuitry. The arm is designed to interface with the LuminAR bulb. The "bulb," which fits into a lightbulb socket, combines a Pico-projector, camera, and wireless computer and can make any surface interactive. The team uses the special spelling "LuminAR" to suggest its place in the group's other Augmented Reality initiatives.

Read more at:

Saturday, December 15, 2012


The site for the upcoming book "Visual Information Retrieval Using Java and LIRE" is on-line

Improving SURF Image Matching Using Supervised Learning

(Suggested Article)

Hatem Mousselly-sergieh [LIRIS] , Elod Egyed-zsigmond [LIRIS] , Mario Döller [FH Kufstein Tirol - UNIVERSITY OF APPLIED SCIENCES] , David Coquil [University of Passau] , Jean-Marie Pinon [LIRIS] , Harald Kosch [University of Passau]

Dans The 8th International Conference on Signal Image and Internet Systems (SITIS 2012), Naples, Italy.


Keypoints-based image matching algorithms have proven very successful in recent years. However, their execution time makes them unsuitable for online applications. Indeed, identifying similar keypoints requires comparing a large number of high dimensional descriptor vectors. Previous work has shown that matching could be still accurately performed when only considering a few highly significant keypoints. In this paper, we investigate reducing the number of generated SURF features to speed up image matching while maintaining the matching recall at a high level. We propose a machine learning approach that uses a binary classifier to identify keypoints that are useful for the matching process. Furthermore, we compare the proposed approach to another method for keypoint pruning based on saliency maps. The two approaches are evaluated using ground truth datasets. The evaluation shows that the proposed classification-based approach outperforms the adversary in terms of the trade-off between the matching recall and the percentage of reduced keypoints. Additionally, the evaluation demonstrates the ability of the proposed approach of effectively reducing the matching runtime.

Thursday, December 13, 2012

News on LIRE performance

[Article from]

In the course of finishing the book, I reviewed several aspects of the LIRE code and came across some bugs, including one with the Jensen-Shannon divergence. This dissimilarity measure has never been used actively in any features as it didn’t work out in retrieval evaluation the way it was meant to. After two hours staring at the code the realization finally came. In Java the short if statement, “x ? y : z” is overruled by almost any operator including ‘+’. Hence,

System.out.print(true ? 1: 0 + 1) prints '1',


System.out.print((true ? 1: 0) + 1) prints '2'

With this problem identified I was finally able to fix the implementation of the Jensen-Shannon divergence implementation and came to new retrieval evaluation results on the SIMPLIcity data set:


Note that the color histogram in the first row now performs similarly to the “good” descriptors in terms of precision at ten and error rate. Also note that a new feature creeped in: Joint Histogram. This is a histogram combining pixel rank and RGB-64 color.

All the new stuff can be found in SVN and in the nightly builds (starting tomorrow  Winking smile)
[Article from]

Google’s decision to block explicit images is a huge win for Bing &

Google’s decision to block explicit images is a huge win for Bing &

Google has modified its popular image search to block many explicit pictures, a move that could be a big win for competing search engines.

While you used to be able to turn SafeSearch off to easily find questionable material, Google now only lets you “filter explicit images” or “report offensive images.” As you can see in the image above, a search for the word “porn” brings up some questionable material but nothing explicit.

Users on Reddit first noticed the changes this morning, and several were quick to label the move as “censorship.” VentureBeat can confirm that common searches in the U.S. and U.K. have blocked steamy images from showing up in image results and that SafeSearch is on permanently.

A Google spokesperson told us and other outlets the following statement about the changes:

We are not censoring any adult content, and want to show users exactly what they are looking for — but we aim not to show sexually explicit results unless a user is specifically searching for them. We use algorithms to select the most relevant results for a given query. If you’re looking for adult content, you can find it without having to change the default setting — you just may need to be more explicit in your query if your search terms are potentially ambiguous. The image search settings now work the same way as in web search.

Essentially, Google’s decision makes it much harder to find porn using Google. This is a big win for competing search engines, especially Microsoft’s Bing and ICM Registry’s If Google doesn’t want the traffic, the underdogs certainly will take it.

Microsoft’s Bing, the No. 2 search engine on the web, still offers a robust image search, and we can confirm that it works perfectly well for looking at all kinds of explicit images. (Which is sort of funny considering how Microsoft has serious problems with nudity and pornography being hosted on its servers.) is another winner. While it does not offer a full-fledged image search, does offer a safe browsing experience when you are looking for adult material. Plus, you know exactly what you’ll find when looking for video or images on it. As we’ve written before, only crawls online pages with the .xxx domain and it claims to be “safer” than using other search engines to find porn because all sites found through it are scanned daily by McAfee.

“We are still digesting exactly what this will mean in real-world search queries for the porn-searching consumer, but this seems to continue a trend we have seen in recent months by the major search engines towards adult content,” ICM Registry CEO Stuart Lawley told us via email. “Google’s decision only serves to reinforce the purpose and usefulness of what ICM Registry has been building: a destination for those adult consumers looking for high quality content.”


World's most anatomically correct musculoskeletal robot is presented in Japan

The University of Tokyo's JSK Lab have developed what could be considered the world's most...

Most human-like robots don't even attempt biological accuracy, because replicating every muscle in the body isn't necessary for a functional humanoid. Even biomimetic robots based on animals don't attempt to replicate every anatomical detail of the animals they imitate, because that would needlessly complicate things. That said, there is much to be learned from how muscle groups move and interact with the skeleton, which is why a team at Tokyo University's JSK Lab has developed what could be considered the world's most anatomically correct robot to date.

Researchers there have been developing increasingly complex musculoskeletal robots for more than a decade. Their first robot, Kenta, was built in 2001, followed by Kotaro in 2005, Kojiro in 2007, and Kenzoh (an upper-body only robot) in 2010. Their latest robot, Kenshiro, was presented at the annual Humanoids conference this month.

It models the average 12 year-old Japanese boy, standing 158 cm (5 feet, 2 inches) tall and weighing 50 kg (110 pounds). According to Yuto Nakanishi, the project leader, keeping the robot's weight down was a difficult balancing act. Nonetheless, the team managed to create muscles which reproduce nearly the same joint torque as real muscles, and that are roughly five times more powerful than Kojiro's.

Muscle and bone

Its artificial muscles – which are a bit like pulleys – replicate 160 major muscles: each leg has 25, each shoulder has 6, the torso has 76, and the neck has 22. Most of these muscles are redundant to Kenshiro's actual degrees of freedom (64), which is why other humanoids don't bother with them. By way of comparison, mechanical robots like Samsung's Roboraytypically have just six servos per leg, and often don't contain any in the torso/spine (the human body actually contains around 650 muscles).

A detailed look at Kenshiro's knee joint, which contains artificial ligaments and a floati...

A detailed look at Kenshiro's knee joint, which contains artificial ligaments and a floating patella

Equally important to the muscles is Kenshiro's bone structure. Unlike its predecessors, Kenshiro's skeleton was made out of aluminum, which is less likely to break under stress compared to plastic. Also, its knee joints contain artificial ligaments and a patella to better imitate the real thing. These are just some of the details considered in its construction, which far surpasses the work done on the upper-torso Eccerobot cyclops, whose creators claimed it to be the world's most anatomically accurate robot a few years ago.

As you'll see in the following video, programming all of those muscles to work in tandem is proving a difficult task – a bit like playing QWOP multiplied by about a hundred. The robot is able to perform relatively simple tasks, like bending its arms and legs, but more complex actions such as walking remain primitive. However, the team has made significant strides over the years, and with Kenshiro they continue to push the limits of musculoskeletal robots further.


[Article from]

OpenArch Adds A “Digital Layer” To The Average Room

Creating a workable Minority Report-like screen isn’t very hard but what about an entire room or building that responds to touch, voice, and movement? Now that’s hard. That, however, is the goal of OpenArch, a project by designerIon Cuervas-Mons that uses projectors, motion sensors, and light to create interactive spaces.

“This project started 3 years ago when I had the opportunity to buy a small apartment in the north of Spain, in the Basque Country. I decided to start my own research in the small apartment. I am architect and I was really interested on integrating physical and digital layers,” said Cuervas-Mons. “Our objective was to create a Domestic Operating System (D.OS) integrating physical and digital realities.”

The project as seen here is about 40% done and there is still more to do. Cuervas-Mons sees a deep connection between how space defines digital interaction and vice-versa. The goal, in the end, is to create a digital component that can live in any space and enliven it with digital information, feedback, and sensors.

He’s not just stopping at projectors and some computing power. His goal is the creation of truly smart environments.

“I think we need smart homes: first because of energy efficiency, visualization of consumptions on real time will help us not to waste energy. If we introduce physical objects into the interaction with digital information everything will be easier and simpler. They are going to be the center of the future smart cities,” he said.

Cuervas-Mons also runs design consultancy called Think Big Factory where he brings the things he’s learned in the OpenArch project to market. The project itself uses off-the-shelf components like Kinect sensors and projectors

The group will launch a Kickstarter project in January to commercialize the product and make it available to experimenters. How this technology will eventually work in “real life” is anyone’s guess, but it looks like the collective of technologists, architects, and designers is definitely making some waves in the smart home space.


Openarch || FILM from Openarch on Vimeo.

[Article from]

Tuesday, December 4, 2012

Master Theses and SW-Internship @ Sensory Experience Lab

SELab is offering a number of Master Theses and an SW-Internship. In the following an overview of the different offers is given.

Interested students should contact SELab for additional information via selab [at] itec [dot] uni-klu [dot] ac [dot] at

The Sensory Experience Lab (SELab) comprises a small team of experts working in the field of Quality of Multimedia Experience (QoMEx) with the focus on Sensory Experience. That is, traditional multimedia content is annotated with so-called sensory effects that are rendered on special devices such as ambient lights, fans, vibration devices, scent, water sprayer, etc.

The sensory effects are represented as Sensory Effects Metadata (SEM) which are standardized within Part 3 of MPEG-V entitled “Information technology — Media context and control – Part 3: Sensory information”. Further details about MPEG-V and Sensory Information can be found in our Standardization section.

Our software and services are publicly available here and the interested reader is referred to our publications. The media section provides some videos of SELab.

The aim of the research within the SELab is to enhance the user experience resulting in a unique, worthwhile sensory experience stimulating potentially all human senses (e.g., olfaction, mechanoreception, termoreception) going beyond traditional ones (i.e., hearing and vision).

The SELab is guided by an advisory board comprising well-recognized experts in the field of QoE from both industry and academia.

In terms of funding the SELab acknowledges the following institutions and projects: Alpen-Adria-Universität Klagenfurt, ICT FP7 IP ALICANTE, COST IC1003 QUALINET, andICT FP7 Ip SocialSensor.

Reconstructing the World's Museums

Article from: Jianxiong Xiao and Yasutaka Furukawa

Proceedings of the 12th European Conference on Computer Vision (ECCV2012)

ECCV 2012 Best Student Paper Award


Photorealistic maps are a useful navigational guide for large indoor environments, such as museums and businesses. However, it is impossible to acquire photographs covering a large indoor environment from aerial viewpoints. This paper presents a 3D reconstruction and visualization system to automatically produce clean and well-regularized texture-mapped 3D models for large indoor scenes, from ground-level photographs and 3D laser points. The key component is a new algorithm called "Inverse CSG" for reconstructing a scene in a Constructive Solid Geometry (CSG) representation consisting of volumetric primitives, which imposes powerful regularization constraints to exploit structural regularities. We also propose several techniques to adjust the 3D model to make it suitable for rendering the 3D maps from aerial viewpoints. The visualization system enables users to easily browse a large scale indoor environment from a bird's-eye view, locate specific room interiors, fly into a place of interest, view immersive ground-level panorama views, and zoom out again, all with seamless 3D transitions. We demonstrate our system on various museums, including the Metropolitan Museum of Art in New York City -- one of the largest art galleries in the world.


Jianxiong Xiao and Yasutaka Furukawa
Reconstructing the World's Museums
Proceedings of the 12th European Conference on Computer Vision (ECCV2012)
Oral Presentation

This work was done when Jianxiong Xiao interned at Google under the supervision of Yasutaka Furukawa.

Lire 0.9.3_alpha – first alpha release for Lucene 4.0

Article from

I just submitted my code to the SVN and created a download for Lire 0.9.3_alpha. This version features support for Lucene 4.0, which changed quite a bit in its API. I did not have the time to test the Lucene 3.6 version against the new one, so I actually don’t know which one is faster. I hope the new one, but I fear the old one ;)

This is a pre-release for Lire for Lucene 4.0

Global features (like CEDD, FCTH, ColorLayout, AutoColorCorrelogram and alike) have been tested and considered working. Filters, like the ReRankFilter and the LSAFilter also work. The image shows a search for 10 images with ColorLayout and the results of re-ranking the result list with (i) CEDD and (ii) LSA. Visual words (local features), metric indexes and hashing have not been touched yet, beside making it compile, so I strongly recommend not to use them. However, due to a new weighting approach I assume that the visual word implementation based on Lucene 4.0 will — as soon as it is done — be much better in terms for retrieval performance.


Thursday, November 29, 2012

How Google Plans to Find the UnGoogleable

Article from – Author: Tom Simonite

The company wants to improve its mobile search services by automatically delivering information you wouldn’t think to search for online.

For three days last month, at eight randomly chosen times a day, my phone buzzed and Google asked me: “What did you want to know recently?” The answers I provided were part of an experiment involving me and about 150 other people. It was designed to help the world’s biggest search company understand how it can deliver information to users that they’d never have thought to search for online.
Billions of Google searches are made every day—for all kinds of things—but we still look elsewhere for certain types of information, and the company wants to know what those things are.

“Maybe [these users are] asking a friend, or they have to look up a manual to put together their Ikea furniture,” says Jon Wiley, lead user experience designer for Google search. Wiley helped lead the research exercise, known as the Daily Information Needs Study.

If Google is to achieve its stated mission to “organize the world’s information and make it universally accessible,” says Wiley, it must find out about those hidden needs and learn how to serve them. And he says experience sampling—bugging people to share what they want to know right now, whether they took action on it or not—is the best way to do it. “Doing that on a mobile device is a relatively new technology, and it’s getting us better information that we really haven’t had in the past,” he says.
Wiley isn’t ready to share results from the study just yet, but this participant found plenty of examples of relatively small pieces of information that I’d never turn to Google for. For example, how long the line currently is in a local grocery store. Some offline activities, such as reading a novel, or cooking a meal, generated questions that I hadn’t turned to Google to answer—mainly due to the inconvenience of having to grab a computer or phone in order to sift through results.

Wiley’s research may take Google in new directions. “One of the patterns that stands out is the multitude of devices that people have in their lives,” he says. Just as mobile devices made it possible for Google to discover unmet needs for information through the study, they could also be used to meet those needs in the future.

Contextual information provided by mobile devices—via GPS chips and other sensors—can provide clues about a person and his situation, allowing Google to guess what that person wants. “We’ve often said the perfect search engine will provide you with exactly what you need to know at exactly the right moment, potentially without you having to ask for it,” says Wiley.

Google is already taking the first steps in this direction. Google Now offers unsolicited directions, weather forecasts, flight updates, and other information when it thinks you need them (see “Google’s Answer to Siri Thinks Ahead”). Google Glass—eyeglass frames with an integrated display (see “You Will Want Google’s Goggles”)—could also provide an opportunity to preëmptively answer questions or provide useful information. “It’s the pinnacle of this hands-free experience, an entirely new class of device,” Wiley says of Google Glass, and he expects his research to help shape this experience.

Google may be heading toward a new kind of search, one that is very different from the service it started with, says Jonas Michel, a researcher working on similar ideas at the University of Texas at Austin. “In the future you might want to search very new information from the physical environment,” Michel says. “Your information needs are very localized to that place and event and moment.”

Finding the data needed to answer future queries will involve more than just crawling the Web. Google Now already combines location data with real-time feeds, for example, from U.S. public transit authorities, allowing a user to walk up to a bus stop and pull out his phone to find arrival times already provided.

Michel is one of several researchers working on an alternative solution—a search engine for mobile devices dubbed Gander, which communicates directly with local sensors. A pilot being installed on the University of Texas campus will, starting early next year, allow students to find out wait times at different cafés and restaurants, or find the nearest person working on the same assignment.

Back at Google, Wiley is more focused on finding further evidence that many informational needs still go unGoogled. The work may ultimately provide the company with a deeper understanding of the value of different kinds of data. “We’re going to continue doing this,” he says. “Seeing how things change over time gives us a lot of information about what’s important.”

Article from

Wednesday, November 28, 2012

Autonomous Flying Robots: Davide Scaramuzza at TEDxZurich

This talk is about autonomous, vision-controlled micro flying robots. Micro flying robots are vehicles that are less than 1 meter in size and weigh less than 1kg. Potential applications of these robots are search and rescue, inspection, environment monitoring, etc. Additionally, they can complement human intervention in all those environments where no human can access to (such as a searching for survivors in a damaged building after an earthquake), thus, reducing the risk for the human rescuers. In all these applications, current flying robots are still tele-operated by expert professionals.

Indeed, in order to be truly autonomous, current flying robots rely on GPS or motion-capture systems. Unfortunately, GPS does not work indoors, while motion-capture systems require prior modification of the environment where the robots are supposed to operate, which is not possible in environments that are still to be explored. Therefore, my idea consists of using just cameras onboard the robot. Cameras do for a robot what eyes do for a human. They allow it to perceive the environment and safely navigate within it without bumping into obstacles. Additionally, they allow it to build a map of the environment which can be used to plan the intervention of human rescuers. This talk presents our progress towards this endeavor, open challenges, and future applications.
Davide Scaramuzza (born 1980 in Italy) is Professor of Robotics at the Artificial Intelligence Lab of the University of Zurich where he leads the Robotics and Perception Group and Adjunct Faculty at ETH Zurich of the Master in Robotics Systems and Control. He received his PhD in 2008 in Robotics and Computer Vision at ETH Zurich. He was Postdoc a both ETH Zurich and the University of Pennsylvania , where he worked on autonomous navigation of micro aerial vehicles. From 2009 to 2012, he led the European project "sFly" (, which focused on autonomous navigation of micro helicopters in GPS-denied environments using vision as the main sensor modality. For his research, he was awarded the Robotdalen Scientific Awards (2009) and the European Young Researcher Award (2012), sponsored by the IEEE and the European Commission. He is coauthor of the 2nd edition of the book "Introduction to Autonomous Mobile Robots" (MIT Press). He is also author of the first open-source Omnidirectional Camera Calibration Toolbox for MATLAB (a popular software simulation tool), which, besides thousands of downloads worldwide, is also currently in use at NASA, Philips, Bosch, and Daimler. His research interests are field and service robotics, intelligent vehicles, and computer vision. Specifically, he investigates the use of cameras as the main sensors for robot navigation, mapping, exploration, reasoning, and interpretation. His interests encompass both ground and flying vehicles.
In the spirit of ideas worth spreading, TEDx is a program of local, self-organized events that bring people together to share a TED-like experience. At a TEDx event, TEDTalks video and live speakers combine to spark deep discussion and connection in a small group. These local, self-organized events are branded TEDx, where x = independently organized TED event. The TED Conference provides general guidance for the TEDx program, but individual TEDx events are self-organized.* (*Subject to certain rules and regulations)

Saturday, November 24, 2012

Microsoft’s Google Glass rival tech tips AR for live events

[Article from]

Microsoft is working on its own Google Glass alternative, a wearable computer which can overlay real-time data onto a user’s view of the world around them. The research, outed in a patent application published today for “Event Augmentation with Real-Time Information” (No. 20120293548), centers on a special set of digital eyewear with one or both lenses capable of injecting computer graphics and text into the user’s line of sight, such as to label players in a sports game, flag up interesting statistics, or even identify objects and offer contextually-relevant information about them.

The digital glasses would track the direction in which the wearer was looking, and adjust its on-screen graphics accordingly; Microsoft also envisages a system whereby eye-tracking is used to select areas of focus within the scene. Information shown could follow a preprogrammed script – Microsoft uses the example of an opera, where background detail about the various scenes and arias could be shown in order – or on an ad-hoc basis, according to contextual cues from the surrounding environment.

Actually opting into that data could be based on social network checkins, Microsoft suggests, or by the headset simply using GPS and other positioning sensors to track the wearer’s location. The hardware itself could be entirely self-contained, within glasses, as per what we’ve seen of Google’s Project Glass, or it could split off the display section from a separate “processing unit” in a pocket or worn on the wrist, with either a wired or wireless connection between the two.

In Microsoft’s cutaway diagram – a top-down perspective of one half of the AR eyewear – there’s an integrated microphone (910) and a front-facing camera for video and stills (913), while video is shown to the wearer via a light guide (912). That (along with a number of lenses) works with standard eyeglass lenses (916 and 918), whether prescription or otherwise, while the opacity filter (914) helps improve light guide contrast by blocking out some of the ambient light. The picture itself is projected from a microdisplay (920) through a collimating lens (922). There are also various sensors and outputs, potentially including speakers (930), inertial sensors (932) and a temperature monitor (938).

Microsoft is keeping its options open when it comes to display types, and as well as generic liquid crystal on silicon (LCOS) and LCD there’s the suggestion that the wearable could use Qualcomm’smirasol or a Microvision PicoP laser projector. An eye-tracker (934) could be used to spot pupil movement, either using IR projection, an internally-facing camera, or another method.

Whereas Google has focused on the idea of Glass as a “wearable smartphone” that saves users from pulling out their phone to check social networks, get navigation directions, and shoot photos and video, Microsoft’s interpretation of augmented reality takes a slightly different approach in building around live events. One possibility we could envisage is that the glasses might be provided by an entertainment venue, such as a sports ground or theater, just as movie theaters loan 3D glasses for the duration of a film.

That would reduce the need for users to actually buy the (likely expensive) glasses themselves, and – since they’d only be required to last the duration of the show or game – the battery demands would be considerably less than a full day. Of course, a patent application alone doesn’t mean Microsoft is intending a commercial release, but given the company’s apparently increasing focus on entertainment (such as the rumored Xbox set-top box) it doesn’t seem too great a stretch.

Suggested paper: “Conjunctive ranking function using geographic distance and image distance for geotagged image retrieval”

Nowadays, an enormous number of photographic images are uploaded on the Internet by casual users. In this study, we consider the concept of embedding geographical identification of locations as geotags in images. We attempt to retrieve images having certain similarities (or identical objects) from a geotagged image dataset. We then define the images having identical objects as orthologous images. Using content-based image retrieval (CBIR), we propose a ranking function--orthologous identity function (OIF)--to estimate the degree to which two images contain similarities in the form of identical objects; OIF is a similarity rating function that uses the geographic distance and image distance of photographs. Further, we evaluate the OIF as a ranking function by calculating the mean reciprocal rank (MRR) using our experimental dataset. The results reveal that the OIF can improve the efficiency of retrieving orthologous images as compared to using only geographic distance or image distance.

Published in:
Cover ImageProceeding

GeoMM '12 Proceedings of the ACM multimedia 2012 workshop on Geotagging and its applications in multimedia

Playing Catch and Juggling with a Humanoid Robot

Entertainment robots in theme park environments typically do not allow for physical interaction and contact with guests. However, catching and throwing back objects is one form of physical engagement that still maintains a safe distance between the robot and participants. Using a theme park type animatronic humanoid robot, we developed a test bed for a throwing and catching game scenario. We use an external camera system (ASUS Xtion PRO LIVE) to locate balls and a Kalman filter to predict ball destination and timing. The robot's hand and joint-space are calibrated to the vision coordinate system using a least-squares technique, such that the hand can be positioned to the predicted location. Successful catches are thrown back two and a half meters forward to the participant, and missed catches are detected to trigger suitable animations that indicate failure. Human to robot partner juggling (three ball cascade pattern, one hand for each partner) is also achieved by speeding up the catching/throwing cycle. We tested the throwing/catching system on six participants (one child and five adults, including one elderly), and the juggling system on three skilled jugglers.

Wednesday, November 21, 2012

ACM/IEEE Joint Conference on Digital Libraries 2013

The ACM/IEEE Joint Conference on Digital Libraries (JCDL 2013) is a major international forum focusing on digital libraries and associated technical, practical, organizational, and social issues. JCDL encompasses the many meanings of the term digital libraries, including (but not limited to) new forms of information institutions and organizations; operational information systems with all manner of digital content; new means of selecting, collecting, organizing, distributing, and accessing digital content; theoretical models of information media, including document genres and electronic publishing; and theory and practice of use of managed content in science and education.
JCDL 2013 will be held in Indianapolis, Indiana (USA), 23-25 July 2013. The program is organized by an international committee of scholars and leaders in the digital libraries field and attendance is expected to include several hundreds of researchers, practitioners, managers, and students.
* Full paper submissions due: 28 January 2013
* Short Papers, Panels, Posters, Demonstrations, Workshops, Tutorials due: 4 February 2013
* Doctoral Consortium submissions due: 15 April 2013
* Notification of acceptance for Workshops and Tutorials: 15 March 2013
* Notification for Papers, Panels, Posters, Demonstrations, Workshops, Tutorials: 29 March 2013
* Notification of acceptance for Doctoral Consortium: 6 May 2013
* Conference: 22-26 July 2013
** Tutorials and Doctoral Consortium: 22 July 2013
** Main conference: 23-25 July 2013
** Workshops: 25-26 July 2013
The intended community for this conference includes those interested in all aspects of digital libraries such as infrastructure; institutions; metadata; content; services; digital preservation; system design; scientific data management; workflows; implementation; interface design; human-computer interaction; performance evaluation; usability evaluation; collection development; intellectual property; privacy; electronic publishing; document genres; multimedia; social, institutional, and policy issues; user communities; and associated theoretical topics. JCDL welcomes submissions in these areas.
Submissions that resonate with the JCDL 2013 theme of Digital Libraries at the Crossroads are particularly welcome; however, reviews, though they will consider relevance of proposals to digital libraries generally, will not give extra weight to theme-related proposals over proposals that speak to other aspects of digital libraries. The conference sessions, workshops and tutorials will cover all aspects of digital libraries.
Participation is sought from all parts of the world and from the full range of established and emerging disciplines and professions including computer science, information science, web science, data science, librarianship, data management, archival science and practice, museum studies and practice, information technology, medicine, social sciences, education and humanities. Representatives from academe, government, industry, and others are invited to participate.
JCDL 2013 invites submissions of papers and proposals for posters, demonstrations, tutorials, and workshops that will make the conference an exciting and creative event to attend. As always, the conference welcomes contributions from all the fields that intersect to enable digital libraries. Topics include, but are not limited to:
* Collaborative and participatory information environments
* Cyberinfrastructure architectures, applications, and deployments
* Data mining/extraction of structure from networked information
* Digital library and Web Science curriculum development
* Distributed information systems
* Extracting semantics, entities, and patterns from large collections
* Evaluation of online information environments
* Impact and evaluation of digital libraries and information in education
* Information and knowledge systems
* Information policy and copyright law
* Information visualization
* Interfaces to information for novices and experts
* Linked data and its applications
* Personal digital information management
* Retrieval and browsing
* Scientific data curation, citation and scholarly publication
* Social media, architecture, and applications
* Social networks, virtual organizations and networked information
* Social-technical perspectives of digital information
* Studies of human factors in networked information
* Theoretical models of information interaction and organization
* User behavior and modeling
* Visualization of large-scale information environments
* Web archiving and preservation
Paper authors may choose between two formats: Full papers and short papers. Both formats will be included in the proceedings and will be presented at the conference. Full papers typically will be presented in 20 minutes with 10 minutes for questions and discussion. Short papers typically will be presented in 10 minutes with 5 minutes for questions and discussion. Both formats will be rigorously peer reviewed. Complete papers are required -- abstracts and incomplete papers will not be reviewed.
Full papers report on mature work, or efforts that have reached an important milestone. Short papers will highlight efforts that might be in an early stage, but are important for the community to be made aware of. Short papers can also present theories or systems that can be described concisely in the limited space.
Full papers must not exceed 10 pages. Short papers are limited to at most 4 pages. All papers must be original contributions. The material must therefore not have been previously published or be under review for publication elsewhere. All contributions must be written in English and must follow the ACM formatting guidelines (templates available for authoring in LaTex2e and Microsoft Word). Papers are to be submitted via the conference's EasyChair submission page:
All accepted papers will be published by ACM as conference proceedings and electronic versions will be included in both the ACM and IEEE digital libraries.
Posters permit presentation of late-breaking results in an informal, interactive manner. Poster proposals should consist of a title, extended abstract, and contact information for the authors, and should not exceed 2 pages. Proposals must follow the conference's formatting guidelines and are to be submitted via the conference's EasyChair submission page: Accepted posters will be displayed at the conference and may include additional materials, space permitting. Abstracts of posters will appear in the proceedings.
Demonstrations showcase innovative digital libraries technology and applications, allowing you to share your work directly with your colleagues in a high-visibility setting. Demonstration proposals should consist of a title, extended abstract, and contact information for the authors and should not exceed 2 pages. All contributions must be written in English and must follow the ACM guidelines (templates available for authoring in LaTex2e and Microsoft Word), and are to be submitted via the conference's EasyChair submission page:  Abstracts of demonstrations will appear in the proceedings.
Panels and invited briefings will complement the other portions of the program with lively discussions of controversial and cutting-edge issues that are not addressed by other program elements. Invited briefing panels will be developed by the Panel co-chairs David Bainbridge ( and George Buchanan ( and will be designed to address a topic of particular interest to those building digital libraries -- they can be thought of as being mini-tutorials. Panel ideas may be stimulated or developed in part from synergistic paper proposals (with consensus of involved paper proposal submitters).
This year stand-alone formal proposals for panels also will be accepted (; however, please keep in mind that panel sessions are few and so relatively few panel proposals will be accepted. Panel proposals should include a panel title, identify all panel participants (maximum 5), include a short abstract as well as an uploaded extended abstract in PDF (not to exceed 2 pages) describing the panel topic, how the panel will be organized, the unique perspective that each speaker brings to the topic, and an explicit confirmation that each speaker has indicated a willingness to participate in the session if the proposal is accepted. For more information about potential panel proposals, please contact the Panel co-chairs named above.
Tutorials provide an opportunity to offer in-depth education on a topic or solution relevant to research or practice in digital libraries. They should address a single topic in detail over either a half-day or a full day. They are not intended to be venues for commercial product training.
Experts who are interested in engaging members of the community who may not be familiar with a relevant set of technologies or concepts should plan their tutorials to cover the topic or solution to a level that attendees will have sufficient knowledge to follow and further pursue the material beyond the tutorial. Leaders of tutorial sessions will be expected to take an active role in publicizing and recruiting attendees for their sessions.
Tutorial proposals should include: a tutorial title; an abstract (1-2 paragraphs, to be used in conference programs); a description or topical outline of tutorial (1-2 paragraphs, to be used for evaluation); duration (half- or full-day); expected number of participants; target audience, including level of experience (introductory, intermediate, advanced); learning objectives; a brief biographical sketch of the presenter(s); and contact information for the presenter(s).
Tutorial proposals are to be submitted in electronic form via the conference's EasyChair submission page:
Workshops are intended to draw together communities of interest -- both those in established communities and those interested in discussion and exploration of a new or emerging issue. They can range in format from formal, perhaps centering on presentation of refereed papers, to informal, perhaps centering on an extended round-table discussions among the selected participants.
Submissions should include: a workshop title and short description; a statement of objectives for the workshop; a topical outline for the workshop; identification of the expected audience and expected number of attendees; a description of the planned format and duration (half-day, full-day, or one and a half day); information about how the attendees will be identified, notified of the workshop, and, if necessary, selected from among applicants; as well as contact and biographical information about the organizers. Finally, if a workshop or closely related workshop has been held previously, information about the earlier sessions should be provided -- dates, locations, outcomes, attendance, etc.
Workshop proposals are to be submitted in electronic form via the conference's EasyChair submission page:
The Doctoral Consortium is a workshop for Ph.D. students from all over the world who are in the early phases of their dissertation work. Ideally, students should have written or be close to completing a thesis proposal, and be far enough away from finishing the thesis that they can make good use of feedback received during the consortium.
Students interested in participating in the Doctoral Consortium should submit an extended abstract describing their digital library research. Submissions relating to any aspect of digital library research, development, and evaluation are welcomed, including: technical advances, usage and impact studies, policy analyses, social and institutional implications, theoretical contributions, interaction and design advances, and innovative applications in the sciences, humanities, and education. See for a more extensive description of the goals of the Doctoral Consortium and for complete proposal requirements.
Doctoral consortium proposals are to be submitted via the conference's EasyChair submission page:
All contributions must be submitted in electronic form via the JCDL 2013 submission Web page, following ACM guidelines and using the ACM template. Please submit all papers in PDF format.

Don’t Photoshop it…MATLAB it!

Article from

I'd like to welcome back guest blogger Brett Shoelson for the continuation of his series of posts on implementing image special effects in MATLAB. Brett, a contributor for the File Exchange Pick of the Week blog, has been doing image processing with MATLAB for almost 20 years now.


imadjust as an Image Enhancement Tool

In my previous post in this guest series, I introduced my image adjustment GUI, and used it to enhance colors in modified versions of images of a mandrill and of two zebras. For both of those images, I operated on all colorplanes uniformly; i.e., whatever I did to the red plane, I also did to green and blue. The calling syntax forimadjust is as follows:

imgOut = imadjust(imgIn,[low_in; high_in],[low_out; high_out],gamma);

The default inputs are:

imgOut = imadjust(imgIn,[0; 1],[0; 1],1);

Different input parameters will produce different effects. In fact, imadjust should often be the starting point for simply correcting illumination issues with an image:

URL = '';
img = imrotate(imread(URL),-90);
enhanced = imadjust(img,[0.00; 0.35],[0.00; 1.00], 1.00);

You may recall that when I modified the image of two zebras in my previous post, I not only increased low_out, but I also reversed (and tweaked) the values for low_out and high_out:

imgEnhanced = imadjust(imgEnhanced,[0.30; 0.85],[0.90; 0.00], 0.90);

In reversing those input values, I effectively reversed the image. In fact, for a grayscale image, calling

imgOut = imadjust(imgIn,[0; 1],[1; 0],1); % Note the reversal of low_out and high_out

is equivalent to calling imgOut = imcomplement(imgIn):

img = imread('cameraman.tif');
img1 = imadjust(img,[0.00; 1.00],[1.00; 0.00], 1.00);
img2 = imcomplement(img);
assert(isequal(img1,img2))% No error is thrown!
figure;subplot(1,2,1);imshow(img);xlabel('Original image courtesy MIT');

Now recognize that ImadjustGUI calls imadjust behind the scenes, using the standard syntax. If you read the documentation for imadjust carefully, you will learn that the parameter inputs low_in, high_in, low_out, high_out, and gamma need not be scalars. In fact, if those parameters are specifed appropriately as 1-by-3 vectors, then imadjust operates separately on the red, green, and blue colorplanes:

newmap = imadjust(map,[low_in; high_in],[low_out; high_out],gamma)

% ...transforms the colormap associated with an indexed image.
% If low_in, high_in, low_out, high_out, and gamma are scalars, then the
% same mapping applies to red, green, and blue components.
% Unique mappings for each color component are possible when low_in and
% high_in are both 1-by-3 vectors, low_out and high_out are both 1-by-3 vectors,
% or gamma is a 1-by-3 vector.

That works for adjusting colormaps; it also works for adjusting images. As a result, you can readily reverse individual colorplanes of an input RGB image, and in doing, create some cool effects!

Andy Warhol Meets an Elephant

Andy Warhol famously created iconic images of Marilyn Monroe and other celebrities, casting them in startling, unexpected colors, and sometimes tiling them to create memorable effects. We can easily produce similar effects by reversing and saturating individual colorplanes of RGB images. (I wrote ImadjustGUI to facilitate, interactively, those plane-by-plane intensity adjustments.)

Reading and Pre-Processing the Elephant

First, of course, we read and display the elephant:

URL = '';
img = imread(URL);

He's a wrinkly old fellow (below left). I'd like to bring out those wrinkles by enhancing contrast in the image. There are a few ways to do that, but I learned about my favorite way by reading through the "Gray-Scale Morphology" section of DIPUM, 2nd Ed. Specifically, the authors of this (most excellent) book indicated (on page 529) that one could combine topat and bottomhat filters to enhance contrast. (I built the appropriate combination of those filters behind the "Contrast Enhancement" button of MorphTool.) So, using MorphTool-generated code:

SE = strel('Disk',18);
imgEnhanced = imsubtract(imadd(img,imtophat(img,SE)),imbothat(img,SE));

Now, operating with imadjust plane by plane, reversing the red and blue planes, and modifying the gamma mapping, I can easily find my way to several interesting effects. For instance:

imgEnhanced1 = imadjust(imgEnhanced,[0.00 0.00 0.00; 1.00 0.38 0.40],[1.00 0.00 0.70; 0.20 1.00 0.40], [4.90 4.00 1.70]);
imgEnhanced2 = imadjust(imgEnhanced,[0.13 0.00 0.30; 0.75 1.00 1.00],[0.00 1.00 0.50; 1.00 0.00 0.27], [5.90 0.80 4.10]);

So, two more of those interesting effects, and then we can compose the four-elephants image above:

imgEnhanced3 = imadjust(img,[0.20 0.00 0.09; 0.83 1.00 0.52],[0.00 0.00 1.00; 1.00 1.00 0.00], [1.10 2.70 1.00]);
imgEnhanced4 = imadjust(img,[0.20 0.00 0.00; 0.70 1.00 1.00],[1.00 0.90 0.00; 0.00 0.90 1.00], [1.30 1.00 1.00]);

I also wanted to flip two of those enhanced images. fliplr makes it easy to flip a 2-dimensional matrix, but it doesn't work on RGB images. So I flipped them plane-by-plane, and concatenated cat the flipped planes in the third ( z -) dimensioninto new RGB images:

r = fliplr(imgEnhanced2(:,:,1));
g = fliplr(imgEnhanced2(:,:,2));
b = fliplr(imgEnhanced2(:,:,3));
imgEnhanced2 = cat(3,r,g,b);

CompositeImage = [imgEnhanced1 imgEnhanced2; imgEnhanced3 imgEnhanced4]; % (Images 2 and 4 are flipped plane-by-plane.)

Next Up: Put Me In the Zoo!

All images except "cameraman" copyright Brett Shoelson; used with permission.

Get the MATLAB code

Article from

Call for Papers: WIAMIS 2013: The 14th International Workshop on Image and Audio Analysis for Multimedia Interactive Services

Topics of interest include, but are not limited to:

– Multimedia content analysis and understanding
– Content-based browsing, indexing and retrieval of images, video and audio
– Advanced descriptors and similarity metrics for multimedia
– Audio and music analysis, and machine listening
– Audio-driven multimedia content analysis
– 2D/3D feature extraction
– Motion analysis and tracking
– Multi-modal analysis for event recognition
– Human activity/action/gesture recognition
– Video/audio-based human behavior analysis
– Emotion-based content classification and organization
– Segmentation and reconstruction of objects in 2D/3D image sequences
– 3D data processing and visualization
– Content summarization and personalization strategies
– Semantic web and social networks
– Advanced interfaces for content analysis and relevance feedback
– Content-based copy detection
– Analysis and tools for content adaptation
– Analysis for coding efficiency and increased error resilience
– Multimedia analysis hardware and middleware
– End-to-end quality of service support
– Multimedia analysis for new and emerging applications
– Advanced multimedia applications

Important dates:

- Proposal for Special Sessions: 4th January 2013
- Notification of Special Sessions Acceptance: 11th January 2013
- Paper Submission: 8th March 2013
- Notification of Papers Acceptance: 3rd May 2013
- Camera-ready Papers: 24th May 2013

See for more information.

Sunday, October 21, 2012

True story!!

We regret to inform you that you paper has not been accepted

as a PhD student:

as a post-doc:

as a professor:

by Nikolaj

We are pleased to inform you that your paper has been accepted

As a PhD student:

As a post doc:

As a professor:

by Nikolaj & Jilles

Saturday, October 20, 2012

Magic Finger


We present Magic Finger, a small device worn on the fingertip, which supports always-available input. Magic Finger inverts the typical relationship between the finger and an interactive surface: with Magic Finger, we instrument the user’s finger itself, rather than the surface it is touching. Magic Finger senses touch through an optical mouse sensor, enabling any surface to act as a touch screen. Magic Finger also senses texture through a micro RGB camera, allowing contextual actions to be carried out based on the particular surface being touched. A technical evaluation shows that Magic Finger can accurately sense 32 textures with an accuracy of 98.9%. We explore the interaction design space enabled by Magic Finger, and implement a number of novel interaction techniques that leverage its unique capabilities.


Xing-Dong Yang, Tovi Grossman, Daniel Wigdor & George Fitzmaurice. (2012).
Magic Finger: Always-Available Input through Finger Instrumentation
UIST 2012 Conference Proceedings:
ACM Symposium on User Interface Software & Technology.
pp. 147 – 15 Download PDF

1,000,000,000,000 Frames/Second Photography - Ramesh Raskar

Monday, October 8, 2012

PhD and Postdoc Positions:Push the frontiers in vision-based MAV navigation in GPS-denied environments!

(Great Opportunity)

Passion for control, image processing, mathematics, machine learning, and abstract thinking. Apart from your curiosity, your communication skills and your ability to work in teams, you should meet the following requirements:

• Applicants for the PhD position: MSc or equivalent in Electrical, Mechanical Engineering, Computer Science, Physics or closely related field
• Excellent academic track record
• Very good English skills, written and spoken
• Control background
• State-of-the-art computer vision know-how
• Excellent C++ skills with several years of coding experience
• Ability to solve difficult vision problems and to push forward the development of the core control and image algorithms with your ideas
• Skill to analyze and improve algorithms
Experience in mobile robotics (with topics such as 2D/3D SLAM or tracking) and multi-robot control  is a plus. Familiarity with libraries such as Qt, OpenCV, OpenGL and development tools including Matlab, Python, CMake, and Git would also be an advantage.
The candidate is expected to also participate in supervision of bachelor's and master's projects, and the general activities of the Lab.

- Evaluation of the candidates starts immediately but will continue until the position is filled

Please send a single PDF including (in the order) a short letter of motivation (half a page) and your CV (including publication list and a list of at least 3 references). For candidates holding a Masters' degree, please include your transcripts (BSc and MSc).
Send the above PDF to Prof. Dr. Davide Scaramuzza <scaramuzza (dot) applications (at) gmail (dot) com> quoting [PhD Application] or [Postdoc Application] in the subject.
Optionally, send in a copy of undergrad project reports, semester papers or anything else that shows your ability for scientific work and writing.
For questions, please contact Davide Scaramuzza using the same email address for the applications  <scaramuzza (dot) applications (at) gmail (dot) com>

We offer an exciting research opportunity at the forefront of one of the most dynamic engineering fields. You will have both one of the steepest personal learning experiences in your life as well as the opportunity to make an impact in the consumer-electronics market and in the computer vision and robotics communities.
You will be working in a very international team of highly motivated and skilled people. You will grow into a network of international robotics and computer vision professionals.
A Software Engineer or Postdoc position is a regular job with social benefits in Switzerland. You will get a very competitive salary and access to excellent facilities in one of the world's leading technical Robotics Labs. Zurich is regularly ranked among the top cities in the world for quality of life.

- The Ailab counts about 30 people (PhD, Postdocs, and technicians) with more than 10 different nationalities, electronic and machine workshops, 3D printers, and a motion capture system
- Info about the Ailab can be found at
- Information about our current computer vision and robotics past projects can be found at
- The Ailab is located in the Department of Informatics of the University of Zurich
- Information about the University of Zurich can be found at


The Information and Software Engineering Group at the Institute of Software Technology and Interactive Systems at the Vienna University of Technology announces the availability of a 2-year post-doctoral position.
The position is associated to the MUCKE project, funded within the CHIST-ERA funding scheme of FP7, which fosters highly innovative and multidisciplinary collaborative projects in information and communication sciences and technologies. The project started in October 2012. Its main objective is to devise new and reliable knowledge extraction models designed for multilingual and multimodal data shared on social networks. More information on MUCKE:
The main task of the post-doc is to investigate and propose new models for user and result-list credibility. The aim is to provide the user of a multimedia search system with a list of results from reliable sources, together with an estimation of how reliable each source is, as well as how reliable the entire set of results is. Such reliability is to be understood from two orthogonal perspectives: First, the topical relatedness of the results (i.e. relevance). Second, the likelihood that the result, although topically relevant, is also accurate.
To approach this task, the post-doc must have strong mathematical background (preferably including statistics), as well as a fundamental understanding of Information Retrieval. Applicants with experience in NLP and/or Computer Logic will be favored.
The position is available from November 2012 and is open until filled.
For further information, please send an expression of interest to, attaching a CV.
== About our Group ==
The working group Information and Software Engineering is part of the Institute of Software Technology and Interactive Systems.
It covers the design and development of software and information systems in research and education.
The other groups cover E-Commerce, Business  Informatics, and Interactive Media Systems.
Both theoretical and engineering-like approaches to problem solving have equal balance in our research and educational work - leading to "real world problem solving".
The Group is situated in downtown Vienna, in an environment which brings together all aspects of student and academic life. The post-doc will be part of the Information Management & Processing lab,  which consists of researchers in text and music information retrieval, as well as in multimedia processing. The working language is English. A more detailed description of the IMP lab is available at

Special Issue on Animal and Insect Behavior Understanding in Image Sequences Springer - EURASIP Journal on Image and Video Processing

Deadline for submissions: January 15th, 2013


This special issue aims at reporting on the most recent approaches and tools for the identification, interpretation and description of animal and insect behaviour in image sequences. It specially focuses
on the interactions between (i) computer vision theories and methods, (ii) artificial intelligence techniques for the high-level analysis of animal and insect behaviours and (iii) multimedia semantics methods for indexing and retrieval of animal and insect behaviour detected in images and videos.

With the widespread use of video imaging devices, the study of the behaviour by exploiting visual data has become very popular. The visual information gathered from image sequences is extremely useful to understand the behaviour of the different objects in the scene, as well as how they interact each other or with the surrounding environment. However, whilst a large number of video analysis techniques have been developed specifically for investigating events and behaviours in human-centered applications, very little attention
has been paid to the understanding of other live organisms, such as animals and insects, although huge amount of video data are continuously recorded; e.g. the EcoGrid project or the wide range of nest cams continuously monitor, respectively, underwater reef and birds nests (there exist also variants focusing on wolves, badgers,
foxes etc.). Moreover, the few existing approaches deal only with controlled environments (e.g. labs, cages, etc.)and as such they cannot be used in real-life applications.

The automated analysis of video data in real-life environments poses several challenges for computer vision researchers because of the uncontrolled scene conditions and the nature of the targets to be analysed whose 3D motion tends to be erratic, with sudden direction and speed variations, and appearance and non-rigid shape
can undergo quick changes. Computer Vision tools able to analyse those complex environments are of great interest to biologists in their striving towards analyzing the natural environment, promoting its preservation, and understanding the behaviour and interactions of the living organisms (in- sects, animals, etc.) that are
part of it.

We invite authors to contribute high quality papers that will stimulate the research community on the use image and video analysis methods to be applied in real-life environments for animal and insect behaviour monitoring and understanding.

Potential topics include, but are not limited to:

- Living organism detection, tracking, classification and recognition in image sequences
- Animals and Insects dynamic shape analysis
- Visual surveillance and Event Detection in Ecological Applications
- Stereo Vision and Structure from motion of living organisms
- Event and Activity Recognition in Ecological Videostreams
- Animal and insect behaviour analysis and articulated models
- Animal and insect motion and trajectory analysis
- High-level behaviour recognition and understanding
- Semantic Region Identification in animal and insect populated scenarios
- Categorization and Natural Scene Understanding
- Natural Scene and Object-Scene Interaction Understanding
- Ontologies and semantic annotation of animal and insect motion in video content

Submission Instructions

Submissions to the special issue must include new, unpublished, original research.
Papers must be original and have not been published or submitted elsewhere. All
papers must be written in English. The submissions will be reviewed in a double-blind
procedure by at least three reviewers. The papers must contain no information identifying
the author(s) or their organization(s).

Before submission authors should carefully read over the Instructions for Authors, which are located at

Prospective authors should submit an electronic copy of their complete manuscript through the SpringerOpen
submission system at according to the submission schedule.

They should choose the correct Special Issue in the 'sections' box upon submitting. In addition, they should
specify the manuscript as a submission to the 'Special Issue on Animal and Insect Behaviour Understanding in
Image Sequences' in the cover letter. All submissions will undergo initial screening by the Editors for
fit to the theme of the Special Issue and prospects for successfully negotiating the review process.

Important Dates

15 Jan 2013: Manuscript submission due
15 April 2013: Acceptance/Revision notification
30 May 2013: Revised manuscript due
30 July 2013: Final acceptance notication
15 August 2013: Final manuscript due:
September 2013: Tentative publication

Friday, September 14, 2012


** Free for very limited time! Download now. ***


iPhone Screenshot 1

visolu2 will help you to quickly locate any photo on your device. It is the only app you need for organizing, viewing and finding your photos fast.

View hundreds or even thousands of your photos simultaneously without losing the overview. Find your images in various viewing modes, mapped on a sphere or as perspective view. Navigate visually through your image set.

visolu2 is very intuitive, offering easy drag and zoom controls.
* Attractive, easy-to-use interface
* Unique color sorted view of all your photos
* View in album map or columned list
* View your photos on a plane, a perspective view or even on a sphere
* Photos with similarities are grouped together
* Search for photos visually similar to a sample photo
* Find out where a photo was taken
* Full-featured photo viewer (zoom, pan & swipe navigation)
* Slideshow (dissolve and wipe transition)
* Video player
* Email photos from right inside the app

pixolution Web Site

visolu2, will be freely available until Sunday.

Thursday, September 13, 2012


iSearch - A multi-modal search engine

The user performs search and retrieval of multimedia content using as query text, images, 3D objects and their combinations. The search engine is available at the following URL:

The user can log in to the system (with a user account) or not. In this experiment, no log-in is required.

1.1. Text-based search vs. Image-based search
In order to implement this scenario, you can use the sample queries that are available at the following link:
Here we are searching for a specific media item (e.g. the 2D image of a flower). Here, the case is that the user takes a picture of a flower. Although he/she is not aware of its name, he/she would like to get some additional information about it (plus additional photos, 3D objects - if any, videos, etc.). The first keyword that comes to mind is “flower” (since the specific name of the flower is not known). Type the relevant keyword (“flower”) to the edit box of the GUI and press the button “Search”. The engine returns results from the database.
Alternative queries:
- Search by Image: Press the button. Drag and drop or upload the query image (one of the samples provided in “Sample_Queries_1” folder). Press the search button. The engine returns results with visual similarity to the given image.
Compare the results of the text-based query to those returned using an image query. The comparison should be done according to which result list is more satisfactory.

1.2. Multimodal search
In order to implement this scenario, please download the sample queries that are available at the following link:
In this case, more than one queries are combined.
Search scenarios:
- Search by Image: Press the button. Drag and drop or upload the query image (car_4.jpg) from the “Sample_queries_2” folder. Press the search button. The engine returns results with visual similarity to the given image.
- Search by 3D object: Press the button. Drag and drop or upload the query 3D object (car_3d_1.dae) from the “Sample_queries_2” folder. Press the search button. The engine returns results with visual similarity to the given 3D object. A screenshot of the 3D object is also available if the user has no 3D viewer.
- 3D object + image: first enter as query a 3D object. Then enter as query an image. Press the search button. The engine returns results with visual similarity to both the given 3D object and the image.
Compare the retrieved results with the multimodal queries with those returned with mono-modal queries (using one media each time).

1.3. Free search tasks
Download the “Sample_queries_3” folder from the following link:
Make similar tests with mono-modal and multimodal queries from the Sample_queries_3 folder (text, image, 3D object and their combinations). Evaluate the retrieved results (if they are relevant to the given queries).

2.1. Results Presentation
After pressing the “Search” button, a number of relevant items is retrieved. If you press the mouse (left-click) on one of the results, a pop-up box appears. It includes:
- Image of the object
- 3D representation of the object (if any)
- Sound of the object (if any)
- Video of the object (if any)
The results are shorted by relevance to the query.

3.1. Refine Search
Two options for refinement of search results are provided:
- The “Find Similar” option: The user selects one of the retrieved results and presses the link “Find Similar”. A new search is initiated using as query the selected object.
- The “Relevance Feedback” option: The user marks one or more relevant results by pressing the “star” button (bottom right corner of the object’s visualisation box). After selection of relevant results, the user presses the “Search” button, i.e. the “>” button.

Monday, September 10, 2012

ΘΕΣΕΙΣ ΕΡΕΥΝΗΤΩΝ/manager στα ερευνητικά προγράμματα FP7: 3DTVS και IMPART

στο Εργαστήριο Τεχνητής Νοημοσύνης και Ανάλυσης Πληροφοριών, Τμήμα Πληροφορικής του Αριστοτελείου Πανεπιστημίου Θεσσαλονίκης.

  • Υποψήφιοι Διδάκτορες ή Ερευνητές που να κατέχουν μεταπτυχιακό ή δίπλωμα Ηλεκτρολόγου Μηχανικού / Πληροφορικής/ Μηχανικού Πληροφορικής / Μαθηματικών / Φυσικής ή ισοδύναμο.
  • Μεταδιδακτορικοί ερευνητές
  • Manager για διαχείριση του έργου 3DTVS
  • Προγραμματιστές/Διαχειριστές υπολογιστικών συστημάτων

Προφίλ του εργαστηρίου και σχετικές πληροφορίες μπορείτε να βρείτε στη διεύθυνση . Οι θέσεις χρηματοδοτούνται από τα προγράμματα 3DTVS και IMPART τα οποία χρηματοδοτούνται από την Ευρωπαϊκή Ένωση. Το γενικό θέμα έρευνας είναι η α) ανάλυση, δεικτοδότηση και ανάκτηση βίντεο τρισδιάστατης τηλεόρασης β) επεξεργασία και ανάλυση ψηφιακής εικόνας και βίντεο και η τεχνητή όραση με εφαρμογή σε video postproduction. Μια ενδεικτική λίστα πιθανών ερευνητικών αντικειμένων είναι:

  • Ανάλυση Ψηφιακού Βίντεο
  • Τεχνητή Όραση (αναγνώριση κίνησης, συναισθημάτων)
  • Ανάλυση Τρισδιάστατης Εικόνας
  • Ανάκτηση μόνο τρισδιάστατου βίντεο (3DTV content analysis, indexing and retrieval)

Το ακριβές ερευνητικό αντικείμενο των νέων ερευνητών θα επιλεγεί έτσι, ώστε να συμφωνεί με την προηγούμενη εμπειρία του/ της με σκοπό την επίτευξη της μέγιστης αποδοτικότητας. Η διάρκεια της εργασίας μπορεί να επεκταθεί για 3 χρόνια ή και περισσότερο. Άτομα με αποδεδειγμένη ερευνητική εμπειρία σε ένα απο τα παρακάτω πεδία: ψηφιακή επεξεργασία εικόνας, τεχνητή όραση, γραφικά, διεπαφές ανθρώπου-υπολογιστή, επεξεργασία σήματος, πολύ καλή γνώση αγγλικών, προγραμματισμό σε C/C++ και ιδιαίτερο ενδιαφέρον σε ακαδημαϊκή έρευνα θα προτιμούνται ιδιαιτέρως. Για την θέση manager απαιτούνται και τεχνικές γνώσεις των προαναφερόμενων αντικειμένων και γνώσεις/ενδιαφέρον για διαχείριση προγραμμάτων (προγραμματισμός εργασιών, παρακολούθηση εκτέλεσης και κατανάλωσης πόρων, συγγραφή παραδοτέων, τριμηνιαίοι και ετήσιοι οικονομοτεχνικοί απολογισμοί, consortium agreement, διαχείριση πνευματικών δικαιωμάτων) και πολύ καλή γνώση Αγγλικών. Για τις θέσεις προγραμματιστή/διαχειριστή απαιτούνται και τεχνικές γνώσεις των προαναφερομένων αντικειμένων και γνώσεις/ενδιαφέρον για διαχείριση υπολογιστικών συστημάτων (π.χ. SQL Server, Windows Server & Active Directory, δίκτυη & διαχείριση, επιλογή/επισκευή hardware, web Development, Linux, Apache, MySQL, C/C++, Visual Studio, .NET). Θα προτιμηθούν άτομα που έχουν θετικό/τεχνολογικό πτυχίο και ενδεχομένως μεταπτυχιακό τίτλο (όπως τα προαναφερόμενα).

Η προθεσμία για τις παραπάνω θέσεις είναι μέχρι τις 15 Οκτωβρίου 2012.

Οι υποψήφιοι θα πρέπει να είναι μόνο πολίτες της Ευρωπαϊκής Ένωσης και θα πρέπει να στείλουν τα βιογραφικά τους και τις συστατικές επιστολές με φαξ ή με e-mail (προτιμότερο) στον:

Καθηγητή Ιωάννη Πήτα

Τμήμα Πληροφορικής

Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης

e-mail: pitas<at>aiia<dot>csd<dot>auth<dot>gr