Wednesday, April 30, 2014

Google's Autonomous Cars Are Smarter Than Ever at 700,000 Miles

Google has just posted an update on its self-driving car program, which we've been watching closely for the past several years. The cars have surpassed 700,000 autonomous accident-free miles (around 1.13 million kilometers), and they're learning how to safely navigate through the complex urban jungle of city streets. Soon enough, they'll be better at it than we are. Much better.

We’ve improved our software so it can detect hundreds of distinct objects simultaneously—pedestrians, buses, a stop sign held up by a crossing guard, or a cyclist making gestures that indicate a possible turn. A self-driving vehicle can pay attention to all of these things in a way that a human physically can’t—and it never gets tired or distracted.

This is why we're so excited about a future full of autonomous cars. Yes, driving sucks, especially in traffic, and we'd all love to just take a nap while our cars autonomously take us wherever we want to go. But the most important fact is that humans are just terrible at driving. We get tired and distracted, but that's just scratching the surface. We're terrible at dealing with unexpected situations, our reaction times are abysmally slow, and we generally have zero experience with active accident avoidance if it involves anything besides stomping on the brakes and swerving wildly, which sometimes only make things worse.

An autonomous car, on the other hand, is capable of ingesting massive amounts of data in a very short amount of time, exploring multiple scenarios, and perhaps even running simulations before it makes a decision designed to be as safe as possible. And that decision might (eventually) be one that only the most skilled human driver would be comfortable with, because the car will know how to safely drive itself up to (but not beyond) its own physical limitations. This is a concept that Stanford University was exploring before most of that team moved over to Google's car program along with Sebastian Thrun.

Now, I may be making something out of nothing here, but if we compare the car in the image that Google provided with its latest blog post withan earlier Google car from 2012 (or even the Google car in the video), you'll notice that there's an extra piece of hardware mounted directly underneath the Velodyne LIDAR sensor: a polygonal black box (see close-up, right). I have no idea what'sin that box, but were I to wildly speculate, my guess would be some sort of camera system with a 360-degree field of view.

The Velodyne LIDAR is great at detecting obstacles, but what Google is working on now is teaching their cars to understand what's going on in their environment, and for that, you need vision. The cars always had cameras in the front to look for road signs and traffic lights, but detecting something like a cyclist making a hand signal as they blow past you from behind seems like it would require fairly robust vision hardware along with some fast and powerful image analysis software. 

Or, it could be radar. Or more lasers. We're not sure, except to say that it's new(ish), and that vision is presumably becoming more important for Google as they ask their cars to deal with more complex situations with more variables.

Read More

Wednesday, April 23, 2014

Project Naptha

Meme generation might never be the same again. Project Naptha is a browser extension that lets users select, copy, edit and translate text from any image — so long as it is under 30 degrees of rotation. The plug-in runs on the Stroke Width Transform algorithm Microsoft Research invented for text detection in natural scenes. It also provides the option of using Google's open-source OCR engine Tesseract when necessary. Project Naptha utilizes a technique called "inpainting" to reconstruct images after they've been altered by the extension. According to the website, this entails using an algorithm that fills in the space previously occupied by text with colors from the surrounding area. Right now, the program is only compatible with Google Chrome but a Firefox version may be released in a few weeks.

Sunday, April 6, 2014

SIMPLE Descriptors

Abstract: In this paper we propose and evaluate a new technique that localizes the description ability of several global descriptors. We employ the SURF detector to define salient image patches of blob-like textures and use the  MPEG-7 Scalable Color (SC), Color Layout (CL), Edge Histogram (EH) and the Color and Edge Directivity Descriptor (CEDD) descriptors to produce the final local features’ vectors named SIMPLE-SC, SIMPLE-CL, SIMPLE-EH and SIMPLE-CEDD or “LoCATe” respectively. In order to test the new descriptors in the most straightforward fashion, we use the Bag-Of-Visual-Words framework for indexing and retrieval.  The experimental results conducted on two different benchmark databases, with varying codebook sizes revealed an astonishing boost in the retrieval performance of the proposed descriptors compared both to their own performance (in their original form) and to other state-of-the-art methods of local and global descriptors. 

A set of local image descriptors specifically designed for image retrieval tasks

Image retrieval problems were first confronted with algorithms that tried to extract the visual properties of a depiction in a global manner, following the human instinct of evaluating an image’s content. Experimenting with retrieval systems and evaluating their results, especially on verbose images and images where objects appear with partial occlusions, showed that the accepted correctly ranked results  are positively evaluated by the extraction of the salient regions of an image, rather than the overall depiction. Thus, a representation of the image by its points of interest proved to be a more robust solution. SIMPLE descriptors, emphasize and incorporate the characteristics that allow a more abstract but retrieval friendly description of the image’s salient patches.

Experiments were contacted on two well-known benchmarking databases. Initially experiments were performed using the UKBench database. The UKBench image database consists of 10200 images, separated in 2250 groups of four images each. Each group includes images of a single object captured from different viewpoints and lighting conditions. The first image of every object is used as a query image. In order to evaluate our approach, the first 250 query images were selected. The searching procedure was executed throughout the 10200 images. Since each ground truth includes only four images, the P@4 evaluation method to evaluate the early positions was used.

In the sequel, experiments were performed using the UCID database. This database consists of 1338 images on a variety of topics including natural scenes and man-made objects, both indoors and outdoors. All the UCID images were subjected to manual relevance assessments against 262 selected images.

In the tables that illustrate the results, wherever the BOVW model is employed, only  the best result achieved by each descriptor with every codebook size, is presented.  In other words, for each local feature and for each codebook size, the experiment was repeated  for all 8 weighting schemes but only the best result is listed in the tables. Next to the result, the weighting scheme for which the result was achieved is noted (using the System for the Mechanical Analysis and Retrieval of Text – SMART notation)


Source Code and more details are available at

Details regarding these descriptors can be found at the following paper: (in other words, if you use these descriptors in your scientific work, we kindly ask you to cite the following paper )

C. Iakovidou, N. Anagnostopoulos, Y. Boutalis, A. Ch. Kapoutsis and S. A. Chatzichristofis, “SEARCHING IMAGES WITH MPEG-7 (& MPEG-7 Like) POWERED LOCALIZED DESCRIPTORS: THE SIMPLE ANSWER TO EFFECTIVE CONTENT BASED IMAGE RETRIEVAL”, «12th International Content Based Multimedia Indexing Workshop», June 18-20 2014, Klagenfurt – Austria (Accepted for Publication)

Read more