Saturday, May 29, 2010

Automatic Linguistic Indexing of Pictures (ALIP) By Artificial Neural Network Approach

Article from Codeproject.Com

While I have been coding some AI application I heard some mellow strains of a childish songstress coming from upstairs of the neighbours which they played repeatedly. It was sometimes hardly audible to catch the verses, but I managed to distinguish several characteristic phrases to have a look over some great web search engine (I like it, since it puts some of my codeproject code articles to first 1-2 pages of the search results). The only significant phrase from the song I submitted to the engine was (to prevent undue advertisment), say "фиолетовая паста" (violet paste). I expected it would have given scores of make up advertisments, but contrariwise, just one link from the first page of the search results among cosmetic industry spam pointed to some music web forum with exactly that phrase from the rhymes. The next click of mouse and second search over that engine gave me music group verses of the song, guitar tabs and put me to you tube so I was listening that marvelous music clip.

It is astounding how a person with permanent internet access can in few seconds, after having heard the music, be presented with the verses, group information and video clip to listen to. The process is described as searching on the media data content. As current web searches uses textual information to return results, consider you will be able to give it as a search query either audio, video or image sample the same way you submit your textual requests. Just as the computer was listening to some music it was able to present you the same information.

The concept known as Connected Visual Computing (CVC) is actively pursued by Intel. The CVC concerns the media data processing e.g. when in the field of view of your mobile phone cam emerges some object (ant for example) you can see on the screen its identification obtained by mobile analized its image, that it is say Camponotus herculeanus, or when you see some caption in the street on unknown language, you may view it through your mobile cam and it will display at the same location in the street the same caption but in your native language (augmented reality (AR), 2D/3D overlays), or the above presented example by the search using audio content. The market promises immense propagation. That introduced market will for the very long period of time keep the audience consuming modern hardware and software.

Here I'd like to present the general idia on how the computer may be used to desribe the image analyzing its pixel content known as the Automatic Linguistic Indexing of Pictures (ALIP). The approach is general and is always assumed to extract some descriptive features from the data and to use some rules to attibute the content to some category.

If you're intrested in the immediate applications you may contact the supporting firm System7 of the content based image recognition (CBIR) part of the project.

Background

Basic understanding of AI approaches e.g. neural networks, support vector machines, nearest neighbour classifiers. Image descriptive and transform methods as wavelets, edge extraction, image statistics, histograms. C++/C# experience as in this article you will find how to invoke C++ dll methods from within C# application.

Using the application

In my ALIP experiment I decided to annotate the simple natural image categories. There are 5 ANN classifiers in the project corresponding to:

  • Pictures that might contain animals
  • Pictures that might contain flowers
  • Pictures that might contain landscapes
  • Pictures that might contain sunsets
  • Others pictures that do not contain the above categories or simply unknown image type

You need to use unknown category along with the others you'd like to classify to. As otherwise AI classifier would be able to identify only e.g. animals, flowers, landscapes, sunsets with every image you give. But in real world there are other types of images that do not fall into either of the above presented categories, so you will need to meddle with AI classification thresholds which is rather cumbersome and awkward. But having additional unknown category AI classifier the results of the image identification will be as either one of the known image categories or simply unknown image type the computer can not identify using its petty knowledge.

I adore the image databases, they contain shots from all over the world really nice to observe. I've got about 20000 images for designers bought from a DVD shop. I've taken image samples from the animals, flowers, landscapes, sunsets image types and added all other image categories that do not come from the 4 ones to have unknown image type.

Now the usage of the program is simple enough. Just run the alip.exe and it will load all necessary AI classifiers files (in case of error you will have a message box and will not be able to use it). Then click the [...] button and select the directory that presumably contains some *.jpg files. You may use the ones supported in this demo under pics directory. All the found files will be added to the list box, then just click them to watch in the right panel and see the proposed category in the top left panel. In theory it should be able to comment the image as presented below.

Methodology

Due to the competing intrests with the former organizations and the current one I work for, I will not be able to describe in minute details the methodology and feature extraction methods. I would rather present the general trend and categories of the features used for description of images. As searching over internet for corresponding feature computation will reveal all the necessary papers with particular formulae.

There are some demos availabe online e.g. ALIPr. They use hidden markov models HMMs and wavelet features from the images. You may try the pictures from that article using their methods or vice versa my application with their pictures and compare the annotation results.

As the AI approach is general and assumes some reduction of the original data dimensionality using either features extraction or PCA transform or both, all that is needed is to collect some data, extract the features and train AI classifiers. If you understand my face detection articles you will be able to repeat the experiment:

After you converted your raw image data to the features, just train some AI classifiers to discriminate desired positive category from negative ones.

ALIP features

Generaly they are divided into:

  • Color features
  • Texture features
  • Shape features

The Color features are simply the original raw image data, histogram of the image channels, image profile. Texture features are the known edge extraction methods, wavelet transforms, image statistics (e.g. 1st order: mean, std, skew; 2nd order: contrast, correlation, entropy...). And Shape features tries to estimate the object shapes found in the images. Just have a look at wiki for CBIR.

Typically the original image color space RGB is transformed to alternative spaces as YCbCr, HSV, HSI, CIEXYZ, etc... As alternative spaces might give better discrimination of the data, but you need to experiment with them anyway.

Read More

No comments: