Tuesday, November 10, 2015

Google Just Open Sourced TensorFlow, Its Artificial Intelligence Engine

TECH PUNDIT TIM O’Reilly had just tried the new Google Photos app, and he was amazed by the depth of its artificial intelligence.

O’Reilly was standing a few feet from Google CEO and co-founder Larry Page this past May, at a small cocktail reception for the press at the annual Google I/O conference—the centerpiece of the company’s year. Google had unveiled its personal photos app earlier in the day, andO’Reilly marveled that if he typed something like “gravestone” into the search box, the app could find a photo of his uncle’s grave, taken so long ago.

The app uses an increasingly powerful form of artificial intelligence called deep learning. By analyzing thousands of photos of gravestones, this AI technology can learn to identify a gravestone it has never seen before. The same goes for cats and dogs, trees and clouds, flowers and food.

The Google Photos search engine isn’t perfect. But its accuracy is enormously impressive—so impressive that O’Reilly couldn’t understand why Google didn’t sell access to its AI engine via the Internet, cloud-computing style, letting others drive their apps with the same machine learning. That could be Google’s real money-maker, he said. After all, Google also uses this AI engine to recognize spoken words, translate from one language to another, improve Internet search results, and more. The rest of the world could turn this tech towards so many other tasks, from ad targeting to computer security.

Well, this morning, Google took O’Reilly’s idea further than even he expected. It’s not selling access to its deep learning engine. It’s open sourcing that engine, freely sharing the underlying code with the world at large. This software is called TensorFlow, and in literally giving the technology away, Google believes it can accelerate the evolution of AI. Through open source, outsiders can help improve on Google’s technology and, yes, return these improvements back to Google.

Read More -

IBM brings Watson's cognitive computing to sports

IBM took cognitive computing into the sports world today with a trio of partnerships under which cognitive applications-powered by Watson will help prevent concussions, change the nature of training in golf and transform fans' game-day experiences. The partnerships with Triax Technologies, Spare5 and 113 Industries will use the power of cognitive computing in different ways.

"Cognitive is a new form of computing that represents a seismic shift in technology," Lauri Saft, vice president, IBM Watson Ecosystem, said in a statement today. "We've moved beyond systems that are programmed — the technologies most of us use today — to systems that understand, reason and learn. These latest partnerships exemplify the entrepreneurial nature of our Watson ecosystem. Like so many other industries, sports is awash in data, and cognitive computing allows IBM's partners like Triax Technologies, 113 Industries and Spare5 to apply deeper insights to all of that information to improve athlete performance and redefine the fan experience."

Reducing concussions

Triax Technologies develops and manufactures products to ensure the health and safety of athletes. Its new Triax Smart Impact Monitor (SIM) is a wearable sensor that can be embedded in headbands or skullcaps to track the force and frequency of head impacts. The company says (SIM) empowers parents, coaches and athletic trainers with the tools to improve player safety and refine technique in real-time. Using Watson language service, the device can factor in more diverse data sources to analyze sentiment and infer cognitive and social characteristics.

It's in the hole ...

Watson's deep learning, natural language and vision capabilities are powering Watson Golf Pro from Spare 5. The cognitive app is a personal caddy that amateur players can consult while at the driving range or on the course. It's been trained with a corpus of knowledge from contracted golf professionals on mechanics and drills. By "seeing" a golfer's swing, the app can provide feedback for improving that swing.

Keeping the fan engaged (and spending)

113 Industries is bringing Watson to hockey. It's working with the Pittsburgh Penguins to transform the fan game-day experience with 113 Industries' "Pi" service embedded with Watson natural language and cognitive capabilities. This allows the Penguins to analyze large volumes of fan-based data to develop specialized offers and services for fans at the CONSOL Energy Center. This includes concessions to merchandise and pre-/post-game entertainment.

Tuesday, November 3, 2015

Artificial Intelligence Outperforms Human Data Scientists

Artificial intelligence may be poised to ease the shortage of data scientists who build models that explain and predict patterns in the ocean of “Big Data” representing today’s world. An MIT startup’s computer software has proved capable of building better predictive models than the majority of human researchers it competed against in several recent data science contests. 

Until now, well-paid data scientists have relied on their human intuition to create and test computer models that can explain and predict patterns in data. But MIT’s “Data Science Machine” software represents a fully automated process capable of building such predictive computer models by identifying relevant features in raw data. Such a tool could make human data scientists even more effective by allowing them to build and test such predictive models in far less time. But it might also help more individuals and companies harness the power of Big Data without the aid of trained data scientists.

“I think the biggest potential is for increasing the pool of people who are capable of doing data science,” Max Kanter, a data scientist at MIT’s Computer Science and AI Lab and co-creator of the Data Science Machine software, told IEEE Spectrum. “If you look at the growth in demand for people with data science abilities, it’s far outpacing the number of people who have those skills.”

The Data Science Machine can automatically create accurate predictive models based on raw datasets within two to 12 hours; a team of human data scientists may require months. A paper on the Data Science Machine will be presented this week at the IEEE International Conference on Data Science and Advanced Analytics being held in Paris from 19–21 Oct.

Trained data scientists, who typically draw salaries above $100,000 on average, remain a coveted but scarce resource for companies as diverse as Facebook and Walmart. In 2011, the McKinsey Global Institute estimated that the United States alone might face a shortage of 140,000 to 190,000 people with the analytical skills necessary for data science. A 2012 issue of the Harvard Business Review declared data scientist as the sexiest job of the 21st century.

The reason for such high demand for data scientists comes from Big Data’s revolutionary promise of tapping into vast collections of data—whether it’s the online behavior of social media users, the movements of financial markets worth trillions of dollars, or the billions of celestial objects spotted by telescopes—to explain and predict patterns in the huge datasets. Such models could help companies predict the future behavior of individual customers or aid astronomers in automatically identifying an object in the starry nighttime sky.

But how do you transform a sea of raw data into information that can help businesses or researchers identify and predict patterns? Human data scientists usually have to spend weeks or months working on their predictive computer algorithms. First, they sift through the raw data to identify key variables that could help predict the behavior of related observations over time. Then they must continuously test and refine those variables in a series of computer models that often use machine learning techniques.

Such a time-consuming part of the data scientists’ job description inspired Kanter, an MIT grad student at the time, and Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and AI Lab who acted as Kanter’s master’s thesis advisor, to try creating a computer program that could automate the biggest bottlenecks in data science.

Previous computer software programs aimed at solving such data science problems have tended to be one dimensional, focusing on problems particular to specific industries or fields. But Kanter and Veeramachaneni wanted their Data Science Machine software to be capable of tackling any general data science problem. Veeramachaneni in particular drew on his experience of seeing similar connections among the many industry data science problems he had worked on during his time at MIT. 

Read More

Hilary Mason: Use data science and machine intelligence to build a better future

An algorithm can creatively reimagine the Mona Lisa.

Now what?

In the opening keynote of the Grace Hopper Women in Computing Conference 2015 in Houston, Texas, Fast Forward Labs CEO Hilary Mason talked about the burgeoning world of data science and machine intelligence, and several of the considerations for how they will affect the future.

But first, in a subtle nod to the #ILookLikeAnEngineer movement, Mason introduced herself like this: "I'm a computer scientist, a data scientist, a software engineer, I'm also a CEO and I look like all of those things."

And then she dove into machine intelligence.

"Machines are starting to do things that we might have thought were more in the creative domain of humans," she said, showing several computer-generated takes on the classic Da Vinci painting. Or, she also pointed out some of her favorite data-based apps that have already changed the ways that users function, like Google Maps, Foursquare, or Dark Sky.

Mason outlined reasons why data science and machine learning are having a moment: we have the computing power, we know what to do with data when we have it, and, we're getting access to more and more of it.

Looking at her own history with data, Mason described a moment she and a co-worker had while she was working at Bitly as chief scientist. They were making changes to a Hadoop cluster they had. In order to test a job, they decided to find out what the cutest animal on the internet was.

"We had just used hours of compute time and a petabyte of data to answer the most frivolous question," she said. That ability, though, to "play" with data is important. Mason also referenced a Kickstarter for a LED light up "disco dog" suite — it's a smart phone-controlled vest for your dog.

"When you start to see the ridiculous things occurring, you know something interesting is happening because that means the technology is something we all can use," Mason said.

But, in building new things, even silly things it's important to remember unintended and unforeseen consequences. For example, in 1999, Sony was building and selling a toy robotic dog called Aibo. Recently, though, they stopped supporting them, so if someone happened to still be using their robodog and it malfunctioned, there was no reviving it. And that was actually more common than one would think, leading to funerals for those longtime robotic pets by bereaved owners.

Read More

Friday, October 30, 2015

Capturing a Human Figure Through a Wall using RF Signals


Brillo brings the simplicity and speed of software development to hardware for IoT with an embedded OS, core services, developer kit, and developer console.

Wednesday, October 14, 2015

The future of flying robots | Vijay Kumar | TEDxPenn

Tuesday, October 13, 2015

Turning Phones into 3D Scanners

Thursday, October 8, 2015

Computers can recognise a complication of diabetes that can lead to blindness

ARTIFICIAL intelligence (AI) can sometimes be put to rather whimsical uses. In 2012 Google announced that one of its computers, after watching thousands of hours of YouTube videos, had trained itself to identify cats. Earlier this year a secretive AI firm called DeepMind, bought by Google in 2014, reported in Nature that it had managed to train a computer to play a series of classic video games, often better than a human could, using nothing more than the games’ on-screen graphics.

But the point of such diversions is to illustrate that, thanks to a newish approach going by the name of "deep learning", computers increasingly possess the pattern-recognition skills—identifying faces, interpreting pictures, listening to speech and the like—that were long thought to be the preserve of humans. Researchers, from startups to giant corporations, are now planning to put AI to work to solve more serious problems.

One such organisation is the California HealthCare Foundation (CHCF). The disease in the charity’s sights is diabetic retinopathy, one of the many long-term complications of diabetes. It is caused by damage to the tiny blood vessels that supply the retina. Untreated, it can lead to total loss of vision. Around 80% of diabetics will develop retinal damage after a decade; in rich countries it is one of the leading causes of blindness in the young and middle-aged. Much of the damage can be prevented with laser treatment, drugs or surgery if caught early, but there are few symptoms at first. The best bet is therefore to offer frequent check-ups to diabetics, with trained doctors examining their retinas for subtle but worrying changes.

But diabetes is common and doctors are busy. Inspired by recent advances in AI, the CHCF began wondering if computers might be able to do the job of examining retinas cheaply and more quickly.

Being medics, rather than AI researchers, the CHCF turned for help to a website called Kaggle, which organises competitions for statisticians and data scientists. (It was founded by Anthony Goldbloom, who once worked as an intern at The Economist.) The CHCF uploaded a trove of thousands of images of retinas, both diseased and healthy, stumped up the cash for a $100,000 prize, and let Kaggle’s members—who range from graduate students to teams working for AI companies—get to grips with the problem.

Read More:

Wednesday, October 7, 2015

Text of ISO/IEC CD 15938-14 Reference software, conformance and usage guidelines for compact descriptors for visual search

This part of the MPEG-7 standard provides the reference software, specifies the conformance testing, and gives usage guidelines for ISO/IEC 15938-13: Compact descriptors for visual search (CDVS). CDVS specifies an image description tool designed to enable efficient and interoperable visual search applications, allowing visual content matching in images. Visual content matching includes matching of views of objects, landmarks, and printed documents, while being robust to partial occlusions as well as changes in viewpoint, camera parameters, and lighting conditions. This document is a Committee Draft (CD) text for ballot consideration and comment for ISO/IEC 15938-14: Reference software, conformance and usage guidelines for compact descriptors for visual search.


Objects2action: Classifying and localizing actions without any video example

The ICCV 2015 paper Objects2action: Classifying and localizing actions without any video example by Mihir Jain, Jan van Gemert, Thomas Mensink and Cees Snoek is now available. The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches authors do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. The key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Their semantic embedding has three main characteristics to accommodate for the specifics of actions. First, they propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, they incorporate the automated selection of the most responsive objects per action. And finally, they demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of the approach.

Article from

A search runtime analysis of LIRE on 500k images

Article from

Run time for search in LIRE heavily depends on the method used for indexing and search. There are two main ways to store data and two search strategies for linear search and there is approximate indexing of course. The two storing strategies are to (i) store the actual feature vector in a Lucene text field and (ii) to use the Lucene DocValues data format. While the former allows for easy access, more flexibility and compression, the latter is much faster when accessing raw byte[] data. Linear search then needs to open each and every document and compare the query vector to the one stored in the document. For linear search in Lucene text fields, caching boosts performance, so the byte[] data of the feature vectors is read once from the index and stored in memory. For the DocValues data storage format access is fast enough to allow for linear search. With approximate indexing a query string is used on the inverted index and only the first k best matching candidates are used to fin the n << k actual results by linear search. So first a text search is done, then a linear search on much less images is performed [1]. In our tests we used k=500 and n=10.

Tests on 499,207 images have shown that with this order approximate search is already outperforming linear search. The following numbers are given in ms search time. Note at this point that the average value per search differs for a different number of test runs due to the context of the runs, ie. the state of the Java VM, OS processes, file systems, etc. But the trend can be seen.


(*) Start-up latency when filling the cache was 6.098 seconds

(**) Recall with 10 results on ten runs was 0.76, on 100 run recall was 0.72

As a conclusion with nearly 500,000 images the DocValues approach might be the best choice, as the approximate indexing is loosing around 25% of the results while not boosting runtime performance that much. Further optimization would be for instance query bundling or index splitting in combination with multithreading.

[1] Gennaro, Claudio, et al. “An approach to content-based image retrieval based on the Lucene search engine library.” Research and Advanced Technology for Digital Libraries. Springer Berlin Heidelberg, 2010. 55-66.

What, Where and How? Introducing pose manifolds for industrial object manipulation

In this paper we propose a novel method for object grasping that aims to unify robot vision techniques for efficiently accomplishing the demanding task of autonomous object manipulation. Through ontological concepts, we establish three mutually complementary processes that lead to an integrated grasping system able to answer conjunctive queries such as “What”, “Where” and “How”? For each query, the appropriate module provides the necessary output based on ontological formalities. The “What” is handled by a state of the art object recognition framework. A novel 6 DoF object pose estimation technique, which entails a bunch-based architecture and a manifold modeling method, answers the“Where”. Last, “How” is addressed by an ontology-based semantic categorization enabling the sufficient mapping between visual stimuli and motor commands.


SIMPLE Descriptors

SIMPLE [Searching Images with Mpeg-7 (& Mpeg-7 like) Powered Localized dEscriptors] begun as a collection of four descriptors [Simple-SCD, Simple-CLD, Simple-EHD and Simple-CEDD (or LoCATe)]. The main idea behind SIMPLE is to utilize global descriptors as local ones. To do this, the SURF detector is employed to define regions-of-interest on an image, and instead of using the SURF descriptor, one of the MPEG-7 SCD, the MPEG-7 CLD, the MPEG-7 EHD and the CEDD descriptors is utilized to extract the features of those image’s patches. Finally, the Bag-Of-Visual-Words framework is used to test the performance of those descriptors in CBIR tasks. Furthermore, recently SIMPLE was extended from a collection of descriptors, to a scheme (as a combination of a detector and a global descriptor). Tests have been carried out after utilizing other detectors [the SIFT detector and two Random Image Patches’ Generators (The Random Generator has produced the best results and is portrayed as the preferred choice.)] and currently the performance of that scheme with more global descriptors is being tested.

Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (SIMPLE)
A set of local image descriptos specifically designed for image retrieval tasks

Image retrieval problems were first confronted with algorithms that tried to extract the visual properties of a depiction in a global manner, following the human instinct of evaluating an image’s content. Experimenting with retrieval systems and evaluating their results, especially on verbose images and images where objects appear with partial occlusions, showed that the accepted correctly ranked results  are positively evaluated by the extraction of the salient regions of an image, rather than the overall depiction. Thus, a representation of the image by its points of interest proved to be a more robust solution. SIMPLE descriptors, emphasize and incorporate the characteristics that allow a more abstract but retrieval friendly description of the image’s salient patches.

Experiments were contacted on two well-known benchmarking databases. Initially experiments were performed using the UKBench database. The UKBench image database consists of 10200 images, separated in 2250 groups of four images each. Each group includes images of a single object captured from different viewpoints and lighting conditions. The first image of every object is used as a query image. In order to evaluate our approach, the first 250 query images were selected. The searching procedure was executed throughout the 10200 images. Since each ground truth includes only four images, the P@4 evaluation method to evaluate the early positions was used.

In the sequel, experiments were performed using the UCID database. This database consists of 1338 images on a variety of topics including natural scenes and man-made objects, both indoors and outdoors. All the UCID images were subjected to manual relevance assessments against 262 selected images.

In the tables that illustrate the results, wherever the BOVW model is employed, only  the best result achieved by each descriptor with every codebook size, is presented.  In other words, for each local feature and for each codebook size, the experiment was repeated  for all 8 weighting schemes but only the best result is listed in the tables. Next to the result, the weighting scheme for which the result was achieved is noted (using the System for the Mechanical Analysis and Retrieval of Text – SMART notation)

Experimental Results of all 16 SIMPLE descriptors on the UKBench and the UCID dataset. MAP results in bold fonts mark performances that surpass the baseline performance. Grey shaded results mark the highest performance achieved per detector

Read more and download the open source implementation of the SIMPLE descriptors (C#, Java and MATLAB)

Tuesday, October 6, 2015


While not widely understood, machine learning has been easily accessible since Google Prediction API was released in 2011. With many applications in a wide variety of fields, this tutorial by Alex Casalboni on the Cloud Academy blog is a useful place to start learning how to build a machine learning model using Google Prediction API.

The API offers a RESTful interface as a means to train a machine learning model, and is considered a “black box” due to the restricted access users have to internal configuration. This leaves users with only the “classification” vs “regression” configuration, or the applying of a PMML (Predictive Model Markup Language) file with weighting parameters for categorical models.

This tutorial begins with some brief definitions before beginning on how to upload your dataset to Google Cloud Storage, as required by Google Prediction API. Since this API does not provide a user-friendly Web interface, the tutorial switches to Python scripts via an API call to obtain the modelDescription field, which contains a confusionMatrix structure which informs you how the model behaves.

Google later splits the dataset into two smaller sets; one to train the model, and the second to evaluate it. Users are then shown how to generate new predictions via an API call which returns two values, which are the classified activity and the reliability measure for each class respectively.

The open dataset applied here was built by UCI and will be used to train a multi-class model for HAR (Human Activity Recognition). Collected from accelerometer and gyroscope data on smartphones before being manually labelled, the data is defined by 1 of 6 input activities (walking, sitting, walking up stairs, lying down, etc.). By training the model as instructed here in this tutorial, it will be able to definitively associate sensor data with different activities, such as would be used in activity tracking devices or healthcare monitoring.

Article from

Monday, September 21, 2015

Morph an image to resemble a painting in the style of the great masters

A group of researchers at the University of Tubingen, Germany, have developed an algorithm that can morph an image to resemble a painting in the style of the great masters. Technically called “deep learning” algorithms, they are already in use by companies such as Google for image recognition and other applications.

“The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images,” the researchers wrote in their paper. “Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality.”

A photograph of apartments by a river in Tubingen, Germany was processed to be stylistically similar to various paintings, including J.M. Turner’s “The Wreck of a Transport Ship,” Van Gogh’s “The Starry Night,” and Edvard Munch’s “The Scream.”

More info: | PDF | Twitter (h/t: epochtimes)





20-22 April 2016, Talca, Chile

The International Conference on Pattern Recognition Systems (ICPRS) is a development from the successful Chilean Conference on Pattern
Recognition that reached its 5th edition in 2014. It is organised by the Chilean Association of Pattern Recognition, ACHiRP (affiliated to the IAPR) and is sponsored by the Vision and Imaging Professional Network of the (UK) Institution of Engineering and Technology (IET) who will publish its proceedings (of accepted papers in English where at least one author registers and presents the work at the conference). Papers deemed to be of the required standard and presented at the conference, will be indexed by INSPEC and, through it, IEEE Xplore. All paper submissions will be submitted via Conftool to be peer-reviewed by an international panel of experts. Excellent papers will be invited to submit extended versions for consideration in IET research journals (Computer Vision, Image Processing, Biometrics). For more information see or contact

Call for Papers
Interested authors are invited to submit papers describing novel and previously unpublished results on topics including, but not limited to:
Artificial Intelligence Techniques in Pattern Recognition
Bioinformatics Clustering
Computer Vision
Data Mining
Document Processing and Recognition
Face Recognition
Fuzzy and Hybrid Techniques in PR
High Performance Computing for Pattern Recognition
Image Processing and Analysis
Kernel Machines
Mathematical Morphology
Mathematical Theory of Pattern Recognition
Natural Language Processing and Recognition
Neural Networks for Pattern Recognition
Pattern Recognition Principles
Real Applications of Pattern Recognition
Remote Sensing
Shape and Texture Analysis
Signal Processing and Analysis
Statistical Pattern Recognition
Syntactical and Structural Pattern Recognition
Voice and Speech Recognition

Key Dates
Paper Submission Deadline: 13th January 2016 (camera ready, max. 6 pages)
Notification of acceptance: 21st February 2016
Camera-ready papers: 5th March 2016

Organised by the Chilean Association of Pattern Recognition (affiliated to the IAPR)
Co-sponsored by the IET's Vision and Imaging Professional Network

Wednesday, August 19, 2015

The software Stephen Hawking uses to talk to the world is now free

Professor Stephen William Hawking, CH, CBE, FRS, FRSA,

For almost 20 years, Intel has been building technology to help Stephen Hawking communicate with the world -- and now the company is making the same software the world renowned physicist uses to write books, give speeches and talk available to everybody. For free.

It's called the Assistive Context-Aware Toolkit (ACAT), and it's the very same software Intel baked Swiftkey into for Hawkings early last year. Releasing it as open source software was always the plan, giving engineers, developers and researchers a groundwork they can use to create technology that improves the lives of patients with motor neuron disease and other conditions that make using typical computer interfaces impossible.

Right now ACAT uses webcam-based face recognition for user control, but Intel says developers can augment it with custom inputs. As is, it still works pretty well: I installed it on a Windows tablet for a quick test run and was able to type simple words by flexing my face muscles in the same manner as Professor Hawking -- patiently waiting for the ACAT system to highlight the menu, letter or predictive text word I wanted before moving my cheek. The system can also open documents, browse the web and gives users surprisingly precise cursor control.

The base software is available for free on Github, and Intel is hosting a separate site with documentation, videos on features and compatible sensors and a detailed manual to help users get started. If you're having trouble, you can even contact the project's lead directly (his email is published on the ACAT website) for help. All in all, the project's public release is a great step forward to achieving Professor Hawking's dream of making connected wheelchair and assistive computer technology to every person that needs it.

Check out the project's official Git.Hub page or Intel's project page at the source link below.

Saturday, July 25, 2015

4 PhD places in vision and robotics in Edinburgh

Edinburgh and Heriot-Watt universities have 4 fully funded 4-year PhD industrial studentships with Schlumberger (oil and gas), RSSB (Rail Safety Board), Costain (Engineering solutions), and UoE (unmanned surface vehicles).

More details are below.

Key Features and Benefits
* Fully funded studentship covering Home/EU tuition fees and stipend (14,057 for 2015/16).
* Access to our world class infrastructure, enhanced through 6.1m EPSRC capital grant ROBOTARIUM.
* Students benefit from supervision by academic experts from both institutions and graduate with a joint PhD from University of Edinburgh and Heriot -Watt University.
* Excellent training opportunities, including masters level courses in year one, supplemented by training in commercial awareness, social challenges and innovation.
* Enterprise funds available to support development of early commercialisation prototypes.
* Starting from: September 2015

Entry and Language Requirements
* Applicants should have, or expect to obtain, a first-class degree in Engineering, Computer Science, or related subjects.
* Non-native English speakers need to provide evidence of a SELT (Secure English Language Test) at CEFR (Common European Framework of Reference) Level B2 taken within 2 years of the date of application. The minimum requirement is IELTS 6.5 or equivalent, no individual component can be less than 5.5 in a single sitting. A degree from an English speaking university may also be accepted in some circumstances, but we do not accept TOEFL certificates.

Industrial partners

Schlumberger is the leading supplier of technology, project management, and information solutions for oil and gas companies around the world. Through their well site operations and in their research and engineering facilities, they are working to develop products, services and solutions that optimize
customer performance in a safe and environmentally sound manner.
As automation of drilling processes is developed, operation will be split between completely automated tasks and tasks that are carried out by humans. The project will look at how teams comprising human and robotic actors will collaborate to achieve complex and uncertain tasks in drilling operations. Particular areas of interest include delivery/execution monitoring of collaborative plans; developing/maintaining trust between human and automated parts of the system; multi-modal interfaces for communication and coordination; dynamically changing
activities in response to unexpected events/changes in priorities; and reliable state/event detection and communication mechanisms that prioritise significant events and support effective human decision-making.
To find out more please contact: Professor David Lane (

RSSB is a not-for-profit organisation whose purpose is to help members to continuously
improve the level of safety in the rail industry, to drive out unnecessary cost and to
improve business performance. ERTMS (the European Railway Traffic Management System)
and ATO (Automatic Train Operation) are changing the task of driving a train. This is
occurring at a time when automation of transport systems (e.g. automated passenger
pods at Heathrow airport, the Google Car, automated mining trucks etc.) is becoming
increasingly common through the convergence of low cost, high performance sensors,
communications and computing systems and the development of advanced code libraries
for extracting information from sensor data. With these factors in mind, it can be
expected that the way a train driver operates will be influenced by these developments
in order to achieve safer, more efficient and more frequent train services.
To find out more please contact: Professor Ruth Aylett (

Costain is recognised as one of the UK.s leading engineering solutions providers,delivering integrated consulting, project delivery and operations and maintenance services to major blue-chip customers in targeted market sectors. Many repetitive industrial tasks require significant cognitive load which results in operator
fatigue and in turn can become dangerous. The development of robotic sensing technology and compliant feedback technology, will allow semi-autonomous robotics systems to improve this type of work flow. This project aims to explore methods in which a robotic system with shared autonomy can contribute to the operation of a Kinesthetic tool (such as a piece of machinery) and in doing so reduce the cognitive load and fatigue of the human operator. As this is an EPSRC iCASE (industrial CASE) studentship, over the course of the four years, the student will be required to spend at least 3 months at the sponsor's premises. This project is only valid for UK students due to the nature of the funding.
To find out more please contact: Professor Sethu Vijayakumar (

Intention-aware Motion Planning.Project only valid for UK students due to nature of the funding. The goal of this industry sponsored project is to research and extend previous techniques to give a new approach to categorising motion and inferring intent to support robust maritime autonomy decisions in Unmanned Surface Vehicles. Maritime systems have to manage high levels of data sparsity and inhomogeneity to reason effectively in terms of the grammar of motion
adopted by different objects. Elements of topology-based trajectory classification for inferring motion semantics and categorisation, distributed tracking & planning with reactive models, Bayesian reasoning and learning algorithms will be combined and extended for noisy data sampled on large spatiotemporal scales to give high-confidence inference of intent to inform autonomous decisions.

To find out more please contact: Dr Subramanian Ramamoorthy (

Tuesday, July 14, 2015

BabyMaker: Funny Baby Face out of Parents' Pictures - Make a Photo Collage!

By Luxand, Inc.

imageJoin the crowd and start making babies – you only need two photos to begin! More than 30 million babies made by using the technology – enough to populate a small town. Featured in Graham Norton Show by Jennifer Lopez, and reviewed by Globo TV in Brazil, the technology is super popular and a great deal of fun.
Have a crush on someone? Want to see what a baby would look like if you were a couple? Snap pictures of you two, and that baby will be smiling at you in nine seconds instead of nine months!
Based on Luxand biometric identification technologies, BabyMaker applies complex science to deliver hours of fun. Instead of blending the two faces together, the innovative technology identifies facial features in the two source pictures, creates their mathematical representations, and applies powerful calculations to create a model describing a new face that looks like a younger version of the two “parents”. Based on that mathematical model, BabyMaker renders a new face and makes a perfect photo collage showing you two and your baby.
Like that cutie superstar? Superstars like BabyMaker! Have hours of fun by making babies online with whoever you want! Just snap a selfie and pick that other parent, and you’ll see a baby of you two in an instant. You need nothing but a picture of your face to get started!
Strive for perfections? For best results, make sure to use two frontal pictures taken in good lighting conditions. You can use a good selfie, yet the higher-quality source you submit, the more convincing result you will get.
Since lighting condition may vary, BabyMaker may have a hard time detecting the face. If that happens, try using a photo taken in better lighting conditions. In addition, please help us achieve great results by manually selecting your baby’s skin tone as Light, Medium, Dark or Asian.
Want your baby laugh? Just submit pictures of the two parents smiling, and you’ll see a happy face! Want a serious-looking child? Place a lemon in front of you, look straight at the camera, and we can almost guarantee that serious look.
Save your baby’s face to a photo album or share it with friends by sending a text message or email or posting to Facebook, Twitter, Google+ and Whatsapp.
Still not convinced? Try a different pair of photos of you two, and you’ll get a slightly different baby.
Finally, we’re not fortune-tellers, and neither is BabyMaker. Use just for fun, and have fun!

3 postdoctoral researchers and 8 PhD candidates in Computer Vision and Deep Learning

Faculty of Science – Informatics Institute
Publicatiedatum 18 juni 2015
Opleidingsniveau Universitair
Salarisindicatie €2,125 to €4,551 gross per month
Sluitingsdatum 31 augustus 2015
Functieomvang 38 hours per week
Vacaturenummer 15-233

The  Faculty of Science holds a leading position internationally in its fields of research and participates in a large number of cooperative programs with universities, research institutes and businesses. The faculty has a student body of around 4,000 and 1,500 members of staff, spread over eight research institutes and a number of faculty-wide support services. A considerable part of the research is made possible by external funding from Dutch and international organizations and the private sector. The Faculty of Science offers thirteen Bachelor's degree programs and eighteen Master’s degree programs in the fields of the exact sciences, computer science and information studies, and life and earth sciences.

Since September 2010, the whole faculty has been housed in a brand new building at the Science Park in Amsterdam. The installment of the faculty has made the Science Park one of the largest centers of academic research in the Netherlands.

The  Informatics Institute is one of the large research institutes with the faculty, with a focus on complex information systems divided in two broad themes: 'Computational Systems' and 'Intelligent Systems.' The institute has a prominent international standing and is active in a dynamic scientific area, with a strong innovative character and an extensive portfolio of externally funded projects.

Project description

This summer Qualcomm, the world-leader in mobile chip-design, and the University of Amsterdam, a world-leading computer science department, have started a joint research lab in Amsterdam, the Netherlands, as a great opportunity to join the best of academic and industrial research. Leading the lab are profs. Max Welling (machine learning), Arnold Smeulders (computer vision analysis), and Cees Snoek (image categorization).

The lab will pursue world-class research on the following eleven topics:

Project 1 CS: Spatiotemporal representations for action recognition. Automatically recognize actions in video, preferablywhich action appears when and where as captured by a mobile phone, and learned from example videos and without example videos.

Project 2 CS: Fine-grained object recognition. Automatically recognize fine-grained categories with interactive accuracy by using very deep convolutional representations computed from automatically segmented objects and automatically selected features.

Project 3 CS: Personal event detection and recounting.Automatically detect events in a set of videos with interactive accuracy for the purpose of personal video retrieval and summarization. We strive for a generic representation that covers detection, segmentation, and recounting simultaneously, learned from few examples.

Project 4 CS: Counting. The goal of this project is to accurately count the number of arbitrary objects in an image and video independent of their apparent size, their partial presence, and other practical distractors. For use cases as in Internet of Things or robotics.

Project 5 AS: One shot visual instance search. Often when searching for something, a user will have available just 1 or very few images of the instance of search with varying degrees of background knowledge.

Project 6 AS: Robust Mobile Tracking. In an experimental view of tracking, the objective is to track the target’s position over time given a starting box in frame 1 or alternatively its typed category especially for long-term, robust tracking.

Project 7 AS: The story of this. Often when telling a story one is not interested in what happens in general in the video, but what happens to this instance (a person, a car to pursue, a boat participating in a race). The goal is to infer what the target encounters and describe the events that occur it.

Project 8 AS:  Statistical machine translation. The objective of this work package is to automatically generate grammatical descriptions of images that represent the meaning of a single image, based on the annotations resulting from the above projects.

Project 9 MW: Distributed deep learning. Future applications of deep learning will run on mobile devices and use data from distributed sources. In this project we will develop new efficient distributed deep learning algorithms to improve the efficiency of learning and to exploit distributed data sources.

Project 10 MW: Automated Hyper-parameter Optimization. Deep neural networks have a very large number of hyper-parameters. In this project we develop new methods to automatically and efficiency determine these hyperparameters from data for deep neural networks.

Project 11 MW: Privacy Preserving Deep Learning. Training deep neural networks from distributed data sources must take privacy considerations into account. In this project we will develop new distributed and privacy preserving learning algorithms for deep neural networks.

PhD candidates
  • Master degree in Artificial Intelligence, Computer Science, Physics or related field;
  • excellent programming skills (the project is in Matlab, Python and C/C++);
  • solid mathematics foundations, especially statistics and linear algebra;
  • highly motivated;
  • fluent in English, both written and spoken;
  • proven experience with computer vision and/or machine learning is a big plus.
Postdoctoral researchers
  • PhD degree in computer vision and/or machine learning;
  • excellent publication record in top-tier international conferences and journals;
  • strong programming skills (the project is in Matlab, Python and C/C++);
  • motivated and capable to coordinate and supervise research.
Further information

Informal inquiries on the positions can be sent by email to:


Starting date: before Fall 2015.

The appointment for the PhD candidates will be on a temporary basis for a period of 4 years (initial appointment will be for a period of 18 months and after satisfactory evaluation it can be extended for a total duration of 4 years) and should lead to a dissertation (PhD thesis). An educational plan will be drafted that includes attendance of courses and (international) meetings. The PhD student is also expected to assist in teaching of undergraduates.

Based on a full-time appointment (38 hours per week) the gross monthly salary will range from €2,125 in the first year to €2,717 in the last year. There are also secondary benefits, such as 8% holiday allowance per year and the end of year allowance of 8.3%. The Collective Labour Agreement (CAO) for Dutch Universities is applicable.

The appointment of the  postdoctoral research fellows will be full-time (38 hours a week) for two years (initial employment is 12 months and after a positive evaluation, the appointment will be extended further with 12 months). The gross monthly salary will be in accordance with the University regulations for academic personnel, and will range from €2.476 up to a maximum of  €4.551 (scale 10/11) based on a full-time appointment depending on qualifications, expertise and on the number of years of professional experience. The Collective Labour Agreement for Dutch Universities is applicable. There are also secondary benefits, such as 8% holiday allowance per year and the end of year allowance of 8.3%.

Some of the things we have to offer:

  • competitive pay and good benefits;
  • top-50 University worldwide;
  • interactive, open-minded and a very international city;
  • excellent computing facilities.

English is the working language in the Informatics Institute. As in Amsterdam almost everybody speaks and understands English, candidates need not be afraid of the language barrier.

Job application

Applications may only be submitted by sending your application to To process your application immediately, please quote the vacancy number 15-233 and the position and the project you are applying for in the subject-line. Applications must include a motivation letter explaining why you are the right candidate, curriculum vitae, (max 2 pages), a copy of your Master’s thesis or PhD thesis (when available), a complete record of Bachelor and Master courses (including grades), a list of projects you have worked on (with brief descriptions of your contributions, max 2 pages) and the names and contact addresses of two academic references. Also indicate a top-3 of projects you would like to work on and why. All these should be grouped in one PDF attachment.

Wednesday, July 8, 2015

Flipkart rolls out image search for mobile shopping

India’s online marketplace Flipkart has started rolling out image search on its mobile app to improve the shopping experience.

Instead of typing keywords, users can upload photos of fashion items and find similar products in terms of color, pattern or style inside the Flipkart merchandise database.

Users browsing Flipkart's catalogue can find visually similar products with a single tap. The app becomes a virtual "shop assistant" who would show products of same color or design when users see something they like.

Machine intelligence pioneer ViSenze is providing the technology for visual search and image recognition.

"We're very excited to partner with Flipkart and offer their users an enhanced shopping experience powered by our visual search,” said Oliver Tan, CEO and Co-Founder of ViSenze.

The company originates from a R&D spin-off from the National University of Singapore, and develops highly advanced visual search algorithms, combining state-of-the-art deep learning with the latest computer vision technology to solve search and recognition problems faced by businesses in the visual web space.

It provides its visual technology APIs through a Software-as-a-Service offering to online retailers, content owners, brands and advertisers, app developers and digital publishers, enabling their platforms to recognize products for retrieval purposes or instant purchases.

Other companies using the service include Internet retailers and marketplaces like Caratlane, Zalora, Reebonz, and Rakuten Taiwan, as well as patent search engines like PatSnap.

Original Article:

Tuesday, June 30, 2015

IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation

Christian Forster, Luca Carlone, Frank Dellaert, Davide Scaramuzza, "IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation", Robotics: Science and Systems (RSS), Rome, 2015.
Supplementary Material:
Recent results in monocular visual-inertial navigation (VIN) have shown that optimization-based approaches outperform filtering methods in terms of accuracy due to their capability to relinearize past states. However, the improvement comes at the cost of increased computational complexity. In this paper, we address this issue by preintegrating inertial measurements between selected keyframes. The preintegration allows us to accurately summarize hundreds of inertial measurements into a single relative motion constraint. Our first contribution is a preintegration theory that properly addresses the manifold structure of the rotation group and carefully deals with uncertainty propagation. The measurements are integrated in a local frame, which eliminates the need to repeat the integration when the linearization point changes while leaving the opportunity for belated bias corrections. The second contribution is to show that the preintegrated IMU model can be seamlessly integrated in a visual-inertial pipeline under the unifying framework of factor graphs. This enables the use of a structureless model for visual measurements, further accelerating the computation. The third contribution is an extensive evaluation of our monocular VIN pipeline: experimental results confirm that our system is very fast and demonstrates superior accuracy with respect to competitive state-of-the-art filtering and optimization algorithms, including off-the-shelf systems such as Google Tango.

Monday, June 29, 2015


picsbuffet is a visual image browsing system to visually explore and search millions of images from stock photo agencies and the like. Similar to map services like Google Maps users may navigate through multiple image layers by zooming and dragging. Zooming in (or out) shows more (or less) similar images from lower (or higher) levels. Dragging the view shows related images from the same level. Layers are organized as an image pyramid which is build using image sorting and clustering techniques. Easy image navigation is achieved because the placement of the images in the pyramid is based on an improved fused similarity calculation using visual and semantic image information. picbuffet also allows to perform searches. After starting an image search the user is automatically directed to a region with suiting results. Additional interesting regions on the map are shown on a heatmap.

picsbuffet 0.9 is the first publicly available version using over 1 million images from fotolia. Currently only the Chrome and Opera browser are supported. Future versions will support more images and other browsers as well. picsbuffet was developed by Radek Mackowiak, Nico Hezel and Prof. Dr. Kai Uwe Barthel at HTW Berlin (University of Applied Science).

picsbuffet could be used with other kind of images such as product photos and the like.

For further information about picsbuffet please contact Kai Barthel:

unnamed (1)


Wednesday, June 24, 2015

UAV survey and 3D reconstruction of the Schiefe Turm von Bad Frankenhausen with the AscTec Falcon 8

An Unsupervised Approach for Comparing Styles of Illustrations

Takahiko Furuya, Shigeru Kuriyama and Ryutarou Ohbuchi

In creating web pages, books, or presentation slides, consistent use of tasteful visual style(s) is quite important. In this paper, we consider the problem of style-based comparison and retrieval of illustrations. In their pioneering work, Garces et al. [2] proposed an algorithm for comparing illustrative style. The algorithm uses supervised learning that relied on stylistic labels present in a training dataset. In reality, obtaining such labels is quite difficult. In this paper, we propose an unsupervised approach to achieve accurate and efficient stylistic comparison among illustrations. The proposed algorithm combines heterogeneous local visual features extracted densely. These features are aggregated into a feature vector per illustration prior to be treated with distance metric learning based on unsupervised dimension reduction for saliency and compactness. Experimental evaluation of the proposed method by using multiple benchmark datasets indicates that the proposed method outperforms existing approaches.


Beyond Vanilla Visual Retrieval

The presentation from professor Jiri Matas @ CBMI 2015

The talk will start with a brief overview of the state of the art in visual retrieval of specific objects. The core steps of the standard pipeline will be introduced and recent development improving both precision and recall as well as the memory footprint will be reviewed. Going off the beaten track, I will present a visual retrieval method applicable in conditions when the query and reference images differ significantly in one or more properties like illumination (day, night), the sensor (visible, infrared) , viewpoint, appearance (winter, summer), time of acquisition (historical, current) or the medium (clear, hazy, smoky). In the final part, I will argue that in image-based retrieval it might be often more interesting to look for most *dissimilar* images of the same scene rather than the most similar ones as conventionally done, as especially in large datasets these are just near duplicates.. As an example of such problem formulation, a method efficiently searching for images with the largest scale difference will be presented. A final demo will for instance show that the method finds surprisingly fine details on landmarks, even those that are hardly noticeable for human.


The presentation from professor Jiri Matas is available here.

Tuesday, June 23, 2015

NOPTILUS Final Experiment

The main objective of this experiment was to evaluate the performance of the extended and enhanced version of the NOPTILUS system on a large-scale, open-sea experiment. Operating in open-sea, especially in oceans, the navigation procedure faces several non-trivial problems, such as strong currents, limited communication, severe weather conditions etc.  Additionally, this is the first experiment in which we incorporate different sensor modalities. Half of the squad was equipped with single beams DVLs while the other half employed multi-beam sensors.
    In order to tackle the open-sea challenges, the new version of the NOPTILUS system incorporates an advanced motion control module that is capable to compensate strong currents, disturbances and turbulences.

    Moreover, the final version of the NOPTILUS system utilizes an improved version of the generic plug-n-play web-system, which allows the operation of larger squads. The developed tool is now capable to split the operation procedure in distinct, non-overlapped, timestamps. Based on the size of the squad, the web-system automatically schedules the transmission of the navigation instructions so as, on the one hand, to meet the available bandwidth requirements, while on the other side of the spectrum, to avoid possible congestion issues.
    To the best of our knowledge this is the first time that a heterogeneous squad of AUVs is capable of fully autonomous navigate in an unknown open sea area, in order to map the underwater surface of the benthic environment and simultaneously to track the movements of a moving target, in an cooperative fashion.

Read More

MPEG CDVS Awareness Event

24 June 2015, Marriot Hotel Warsaw

Event Description: Recent advances in computer vision techniques have made large-scale visual search applications a reality, but have also highlighted industry's need for a standardized solution ensuring interoperability of devices and applications. MPEG, well known for its multimedia coding and representation standards, responded to this need by developing a new standard in this space, Compact Descriptors for Visual Search (MPEG-7 CDVS). The CDVS standard specifies high-performance, low-complexity, compact descriptors from still images, enabling deployment in low-power handheld devices, transmission over congested networks, and the interoperable design of large-scale visual search applications. The purpose of this event is to present CDVS and demonstrate its deployment in a range of applications, from mobile visual search to video content management in broadcasting. The event is targeted to a wide audience and will be of particular interest to developers of visual search applications, multimedia device and sensor manufacturers, multimedia content creators and broadcasters.

Date & Venue: Wednesday 24th June 2015, 14:00 – 18:00 (during 112th MPEG Meeting) Marriott Hotel, Aleje Jerozolimskie 65/79 - 00697 Warsaw, Poland

Registration: The event is open to the public and free of charge. To register (for logistical purposes only), please send an email to


Thursday, June 11, 2015

Machine Vision Algorithm Chooses the Most Creative Paintings in History

Picking the most creative paintings is a network problem akin to finding super spreaders of disease. That’s allowed a machine to pick out the most creative paintings in history.

Creativity is one of humanity’s uniquely defining qualities. Numerous thinkers have explored the qualities that creativity must have, and most pick out two important factors: whatever the process of creativity produces, it must be novel and it must be influential.

The history of art is filled with good examples in the form of paintings that are unlike any that have appeared before and that have hugely influenced those that follow. Leonardo’s 1469 Madonna and child with a pomegranate, Goya’s 1780 Christ crucified or Monet’s 1865 Haystacks at Chailly at sunrise and so on. Others paintings are more derivative, showing many similarities with those that have gone before and so are thought of as less creative.

The job of distinguishing the most creative from the others falls to art historians. And it is no easy task. It requires, at the very least, an encyclopedic knowledge of the history of art. The historian must then spot novel features and be able to recognize similar features in future paintings to determine their influence.

Those are tricky tasks for a human and until recently, it would have been unimaginable that a computer could take them on. But today that changes thanks to the work of Ahmed Elgammal and Babak Saleh at Rutgers University in New Jersey, who say they have a machine that can do just this.

They’ve put it to work on a database of some 62,000 pictures of fine art paintings to determine those that are the most creative in history. The results provide a new way to explore the history of art and the role that creativity has played in it

Read More

Thursday, June 4, 2015

Baidu caught gaming recent supercomputer performance test

Chinese search engine giant Baidu recently made headlines when its supercomputer reportedly beat out challengers from both Google and Microsoft on the ImageNet image recognition test. However, the company has had to back down from those claims and issue an apology after details emerged suggesting that its success resulted from a scheme to cheat the testing system. As such, Baidu's accomplishment has been stricken from the books and the company has been banned from ImageNet challenges for a full year.

The issue began in Mid-May when Baidu claimed to have scored a record low 4.58% error rate on the test. This exam looks at how well computing clusters can identify objects and locations within photographs -- basically the technology behind Google Photo's auto-tagging feature -- except on large-scale file sets. Microsoft and Google, on the other hand scored 4.94 and 4.8 percent error rates, respectively. That's actually a bit better than the 5 percent average trained humans can achieve and a huge deal for the industry.

However on Tuesday, researchers who actually administered the ImageNet test called shenanigans on Baidu for setting up a series of dummy accounts to brute force a successful test run. The test rules state specifically that contestants are allowed to submit only two sets of test results each week. Baidu apparently set up 30 accounts and spammed the service with 200 requests in six months, 40 of which came over a single five-day period in March. Doing so potentially allowed Baidu engineers to artificially increase the recognition rate by "tuning" their software to the existing test data sets.

"This is pretty bad, and it is exactly why there is a held-out test set for the competition that is hosted on a separate server with limited access," Matthew Zeiler, CEO of AI software company Clarifai, told the Wall Street Journal. "If you know the test set, then you can tweak your parameters of your model however you want to optimize the test set."

In response, Baidu has issued a formal apology for its actions. If, you think apology is a good description for calling the incident a "mistake" and refusing to provide any additional details or explanation as to why it happened.

Read More

Wednesday, June 3, 2015

Fail: Computerized Clinical Decision Support Systems for Medical Imaging

Computerized systems that help physicians make clinical decisions fail two-thirds of the time, according to a study published today in the Journal of the American Medical Association (JAMA). With the use of such systems expanding—and becoming mandatory in some settings—developers must work quickly to fix the programs and their algorithms, the authors said. The two-year study, which is the largest of its kind, involved over 3,300 physicians. 

Computerized clinical decision support (CDS) systems make recommendations to physicians about next steps in treatment or diagnostics for patients. The physician enters information about the patient and the ailment, and based on a database of criteria, algorithms come up with a score for how appropriate certain next clinical steps would be. These databases of “appropriateness criteria” have been developed by national medical specialty societies and are used across various CDS systems. They aim to reduce overuse of care that can be costly and harmful to patients

But according to the JAMA study, the leading CDS systems don’t work most of the time. The study tracked more than 117,000 orders input by physicians for advanced diagnostic imaging procedures such as magnetic resonance imaging (MRI) and computed tomography (CT). For two-thirds of those orders, the computer program could not come up with any feedback. “Basically it says, ‘I don’t have a guideline for you. I can’t help you,’” says Peter Hussey, a senior policy researcher at RAND Corporation and the lead author of the study. “When that happens two-thirds of the time...the physicians start to get more negative about it.”

That’s a problem, because these computerized decision makers will soon be mandated by the U.S. federal government. The Protecting Access to Medicare Act of 2014 says that, starting in 2017, CDS systems must be allowed to weigh in on whether advanced diagnostic imaging should be ordered for Medicare patients. CDS systems are already used in the private sector as well, but not widely, Hussey says.   

The systems’ problems are likely caused by lackluster databases and algorithms that fall short, says Hussey. “There are lots of different kinds of patients with different problems, and the criteria just haven’t been created for some of those. In other cases, it’s likely that the criteria were out there but the CDS tools couldn’t find them,” he explains. “These seem like solvable problems, but we need to get working on this pretty quickly becaue this is going to be mandatory in a couple of years.”

Read More

Highly Custom Robot

While some DRC teams received fancy ATLAS robots from DARPA and other teams decided to adapt existing platforms (HUBO and HRP-2, for example) to compete in the Finals, some groups set out to build completely new robots. One of these is Team WALK-MAN from the Italian Institute of Technology (IIT), whose most recent robotic creations include HyQ and COMAN. Before departing to the DRC Finals site in Pomona, Calif., Nikos Tsagarakis, a senior researcher at IIT and WALK-MAN Project Coordinator, spoke with us about his team’s highly customized robot, its mains capabilities, and how it compares to ATLAS.

To design and build WALK-MAN, did you get inspiration from other robots? Which ones?

WALK-MAN was developed as part of the European Commission-funded Project WALK-MAN, and the goal was creating a completely original and new body design, so it is different from any other existing robot we developed so far at IIT. Apart from following our traditional approach to soft robot design by adding joint elasticity to the robot’s joints, WALK-MAN’s hardware is 100 percent new. Its main features include the use of custom designed high-power motor drives able to deliver several kilowatts of peak power at a single joint. We also optimized the design of its body to reduce the inertia and mass and improve the dynamic performance of the robot. A rich sensory system gives us the state of the robot in terms of loads (joint torque sensing) and thermal sensing/fatigue for both the actuators and the electronics. In terms of control, WALK-MAN drives can be controlled in different modes including position, torque, and impedance at rates up to 5 kHz.

How does WALK-MAN compare to ATLAS?

The two robots differ in their actuation system (WALK-MAN is an electrical motor driven robot while ATLAS is a hydraulic system) but are very similar in certain dimensions like height (1.85 m) and shoulder distance (0.8 m). WALK-MAN is lighter (120 kg with backpack) than Atlas (around 180 kg). In terms of capabilities, WALK-MAN joint performance is very close to ATLAS joints. Leg joints can produce torques up to 320 Nm and reach velocities of 11 to 12 radians per second at torques as high as 250 Nm. WALK-MAN arms have more extensive range and can generate torques up to 140 Nm at the shoulder level. We also expect to be a more efficient robot than ATLAS and able to operate for more prolonged periods without recharging.


Read More

Facebook uses deep learning as a way of recognizing images on its social network


FACEBOOK IS OPENING a new artificial intelligence lab in Paris after building a dedicated AI team that spans its offices in New York and Silicon Valley.

The New York University professor who oversees the company’s AI work, Yann LeCun, was born and educated in Paris. LeCun tells WIRED that he and the company are interested in tapping the research talent available in Europe. Alongside London, he says, Paris was an obvious choice for a new lab. “We plan to work openly with and invest in the AI research community in France, the EU, and beyond,” he wrote in a blog post announcing the move.

LeCun is one of the researchers at the heart of an AI movement known as deep learning. Since the 1980s, he and a small group of other researchers have worked to build networks of computer hardware that approximate the networks of neurons in the brain. In recent years, the likes of Facebook, Google, and Microsoft have embraced these “neural nets” as a way of handling everything from voice and image recognition to language translation.

Another researcher who bootstrapped this movement, University of Toronto professor Geoff Hinton, is now at Google. Like Facebook, Google is investing heavily in this rapidly evolving technology, and the two companies are competing for a rather small talent pool. After acquiring a deep learning startup called DeepMind, based in the UK and founded by an English researcher named Demis Hassabis, Google already operates a European AI lab of sorts.

Chris Nicholson, founder of the San Francisco-based AI startup Skymind, points out the many of the key figures behind deep learning are European, including not only LeCun, Hinton, and Hassabis, but also University of Montreal professor Yoshua Bengio (though he was educated in Canada). “All of them are now employed by North American organizations,” Nicholson says. “There are a lot of investment gaps in European venture capital, which means that Europe has a lot of ideas and people that either come to America or never make an impact on the mainstream.”

Today, Facebook uses deep learning as a way of recognizing images on its social network, and it’s exploring the technology as a means of personalizing your Facebook News Feed so that you’re more likely to enjoy what you see. The next big step, LeCun says, is natural language processing, which aims to give machines the power to understand not just individual words but entire sentences and paragraphs.

Article From

From Captions to Visual Concepts and Back

Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, Lawrence Zitnick, and Geoffrey Zweig
June 2015


This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.


Friday, May 22, 2015

MPEG’s Compact Descriptors for Visual Search (CDVS)

Recently, I found these new documents regarding the MPEG-7 - Multimedia Content Description Interface

Past decades have seen an exponential growth in usage of digital media. Early solutions to the management of these massive amounts of digital media fell short of expectations, stimulating intensive research in areas such as Content Based Image Retrieval (CBIR) and, most recently, Visual Search (VS) and Mobile Visual Search (MVS).

The field of Visual Search has been researched for more than a decade leading to recent deployments in the marketplace. As many companies are coming up with proprietary solutions to address the VS challenges, resulting in a fragmented technological landscape and a plethora of non-interoperable systems, MPEG introduces a new worldwide standard for the VS and MVS technology.

MPEG’s Compact Descriptors for Visual Search (CDVS) aims to standardize technologies, in order to enable an interoperable, efficient and cross-platform solution for internet-scale visual search applications and services.

The forthcoming CDVS standard is particularly important because it will ensure interoperability of visual search applications and databases, enabling high level of performance of implementations conformant to the standard, simplifying design of descriptor extraction and matching for visual search applications. It will also enable low complex, low memory hardware support for descriptor extraction and matching in mobile devices and sensibly reduce load on wireless networks carrying visual search-related information. All this will stimulate the creation of an ecosystem benefiting consumers, manufacturers, content and service providers alike.