Simple image recognition program written in Java using JOONE - neural network library.
Sunday, May 30, 2010
Saturday, May 29, 2010
Article from Codeproject.Com
While I have been coding some AI application I heard some mellow strains of a childish songstress coming from upstairs of the neighbours which they played repeatedly. It was sometimes hardly audible to catch the verses, but I managed to distinguish several characteristic phrases to have a look over some great web search engine (I like it, since it puts some of my codeproject code articles to first 1-2 pages of the search results). The only significant phrase from the song I submitted to the engine was (to prevent undue advertisment), say "фиолетовая паста" (violet paste). I expected it would have given scores of make up advertisments, but contrariwise, just one link from the first page of the search results among cosmetic industry spam pointed to some music web forum with exactly that phrase from the rhymes. The next click of mouse and second search over that engine gave me music group verses of the song, guitar tabs and put me to you tube so I was listening that marvelous music clip.
It is astounding how a person with permanent internet access can in few seconds, after having heard the music, be presented with the verses, group information and video clip to listen to. The process is described as searching on the media data content. As current web searches uses textual information to return results, consider you will be able to give it as a search query either audio, video or image sample the same way you submit your textual requests. Just as the computer was listening to some music it was able to present you the same information.
The concept known as Connected Visual Computing (CVC) is actively pursued by Intel. The CVC concerns the media data processing e.g. when in the field of view of your mobile phone cam emerges some object (ant for example) you can see on the screen its identification obtained by mobile analized its image, that it is say Camponotus herculeanus, or when you see some caption in the street on unknown language, you may view it through your mobile cam and it will display at the same location in the street the same caption but in your native language (augmented reality (AR), 2D/3D overlays), or the above presented example by the search using audio content. The market promises immense propagation. That introduced market will for the very long period of time keep the audience consuming modern hardware and software.
Here I'd like to present the general idia on how the computer may be used to desribe the image analyzing its pixel content known as the Automatic Linguistic Indexing of Pictures (ALIP). The approach is general and is always assumed to extract some descriptive features from the data and to use some rules to attibute the content to some category.
If you're intrested in the immediate applications you may contact the supporting firm System7 of the content based image recognition (CBIR) part of the project.
Basic understanding of AI approaches e.g. neural networks, support vector machines, nearest neighbour classifiers. Image descriptive and transform methods as wavelets, edge extraction, image statistics, histograms. C++/C# experience as in this article you will find how to invoke C++ dll methods from within C# application.
Using the application
In my ALIP experiment I decided to annotate the simple natural image categories. There are 5 ANN classifiers in the project corresponding to:
- Pictures that might contain animals
- Pictures that might contain flowers
- Pictures that might contain landscapes
- Pictures that might contain sunsets
- Others pictures that do not contain the above categories or simply unknown image type
You need to use unknown category along with the others you'd like to classify to. As otherwise AI classifier would be able to identify only e.g. animals, flowers, landscapes, sunsets with every image you give. But in real world there are other types of images that do not fall into either of the above presented categories, so you will need to meddle with AI classification thresholds which is rather cumbersome and awkward. But having additional unknown category AI classifier the results of the image identification will be as either one of the known image categories or simply unknown image type the computer can not identify using its petty knowledge.
I adore the image databases, they contain shots from all over the world really nice to observe. I've got about 20000 images for designers bought from a DVD shop. I've taken image samples from the animals, flowers, landscapes, sunsets image types and added all other image categories that do not come from the 4 ones to have unknown image type.
Now the usage of the program is simple enough. Just run the alip.exe and it will load all necessary AI classifiers files (in case of error you will have a message box and will not be able to use it). Then click the [...] button and select the directory that presumably contains some *.jpg files. You may use the ones supported in this demo under pics directory. All the found files will be added to the list box, then just click them to watch in the right panel and see the proposed category in the top left panel. In theory it should be able to comment the image as presented below.
Due to the competing intrests with the former organizations and the current one I work for, I will not be able to describe in minute details the methodology and feature extraction methods. I would rather present the general trend and categories of the features used for description of images. As searching over internet for corresponding feature computation will reveal all the necessary papers with particular formulae.
There are some demos availabe online e.g. ALIPr. They use hidden markov models HMMs and wavelet features from the images. You may try the pictures from that article using their methods or vice versa my application with their pictures and compare the annotation results.
As the AI approach is general and assumes some reduction of the original data dimensionality using either features extraction or PCA transform or both, all that is needed is to collect some data, extract the features and train AI classifiers. If you understand my face detection articles you will be able to repeat the experiment:
- Face Detection C++ Library with Skin and Motion Analysis
- Ultra Rapid Object Detection in Computer Vision Applications with Haar-like Wavelet Features
After you converted your raw image data to the features, just train some AI classifiers to discriminate desired positive category from negative ones.
Generaly they are divided into:
- Color features
- Texture features
- Shape features
The Color features are simply the original raw image data, histogram of the image channels, image profile. Texture features are the known edge extraction methods, wavelet transforms, image statistics (e.g. 1st order: mean, std, skew; 2nd order: contrast, correlation, entropy...). And Shape features tries to estimate the object shapes found in the images. Just have a look at wiki for CBIR.
Typically the original image color space RGB is transformed to alternative spaces as YCbCr, HSV, HSI, CIEXYZ, etc... As alternative spaces might give better discrimination of the data, but you need to experiment with them anyway.
Thursday, May 27, 2010
Following on from a New Scientist article that was written a few days ago, I ended up on the website of Taeg Sang Cho -- a graduate student at MIT. He's been working on a bunch of advanced imaging algorithms -- with gifts and grants from big names like Microsoft, Adobe and Google.
His recent work -- three research papers -- is all about content-aware manipulation of photos. I'm struggling to pick one because they're all awesome, so I'll just give you the highlights:
- A probabilistic jigsaw puzzle solver -- this is the technology featured in the New Scientist article, so there's lots of dumbed-down details if you don't want to read the paper itself. In essence, it does exactly what a human does: matches edges, but it does it quickly and very accurately. Similar technology could be used in photo manipulation (and may indeed already be used by Adobe's Content-Aware Fill) -- the biggest give-away when you manipulate images are edges. This technology could magic away those edges!
- A content-aware image prior -- this is a funky way of saying 'image restoration', and I wouldn't be surprised if this is a sneak-peek at the technology you'll see in Photoshop CS6! Look at the sample photo -- the results speak for themselves.
- Motion blur removal with orthogonal parabolic exposures -- (phew, just typing that gave me a bit of a hard-on) -- in layman's terms, this is blur removal by taking two photos from slightly different viewpoints and then... performing some magic. Again, look at the sample images for some fantastic proof. I wouldn't expect to see moving lenses in still cameras any time soon though...
New York invasion by 8-bits creatures !
PIXELS is Patrick Jean' latest short film, shot on location in New York.
Written, directed by : Patrick Jean
Director of Photograhy : Matias Boucard
SFX by Patrick Jean and guests
Produced by One More Production
Over the past year or so, Microsoft’s robotics group has been working quietly, very quietly. That’s because, among other things, they were busy planning a significant strategy shift.
Microsoft is upping the ante on its robotics ambitions by announcing today that its Robotics Developer Studio, or RDS, a big package of programming and simulation tools, is now available to anyone for free.
The Microsoft RDS supports a number of hardware platforms, including the Lego Mindstorms NXT, iRobot Create and Parallax Boe-Bot, and it provides a physics-based simulation environment to allow you to test your designs.
(please to note: the download is almost 500MB)
Sunday, May 16, 2010
I have a philosophical question and wait for your answer.
How do you define the Term "Similar"?
When 2 images are considered as “Similar images”?
According to Google, as similar is defined:
- marked by correspondence or resemblance; "similar food at similar prices"; "problems similar to mine"; "they wore similar coats"
- alike(p): having the same or similar characteristics; "all politicians are alike"; "they looked utterly alike"; "friends are generally alike in background and taste"
- like: resembling or similar; having the same or some of the same characteristics; often used in combination; "suits of like design"; "a limited circle of like minds"; "members of the cat family have like dispositions"; "as like as two peas in a pod"; "doglike devotion"; "a dreamlike quality"
- (of words) expressing closely related meanings
- exchangeable: capable of replacing or changing places with something else; permitting mutual substitution without loss of function or suitability; "interchangeable electric outlets" "interchangeable parts"
- In linear algebra, two n-by-n matrices A and B are called similar if
- Having traits or characteristics in common; alike, comparable; of triangles, etc., having corresponding angles equal and corresponding line ...
- similarly - in like or similar manner; "He was similarly affected"; "some people have little power to do good, and have likewise little strength to resist ...
- similarly - in a like style or manner; Used to link similar items
- Figures that have the same shape but not necessarily the same size. Similar polygons have corresponding angles congruent and corresponding sides in proportion. Congruent is a special case of similar where the ratio of the corresponding sides is 1-1.
- Denotes meaning similarity between words that cannot always be used instead of each other, for instance because they only share a part of their ...
- Selects like-color pixels throughout the image.
- When data are described as "similar" this indicates the difference is likely due to chance as opposed to a real difference between two populations, even if the data points are different. ...
- Two objects such that the distance between any two points of one object is a particular constant times the distance between the corresponding ...
- similarly - 2 ) has been synthesized in NSPs of at least two different shapes and crystalline structures, each of which may have different toxicities. ...
Send me your opinion at savvash<at>gmail<dot>com and i will post it here.
Saturday, May 15, 2010
This year, TPAMI is celebrating its 30th anniversary. To mark this milestone, the IEEE Computer Society’s Publishing Services Department asked journal volunteers to submit their All-Time Favorite Top 10 list and explain their reasons for choosing the papers. Free, limited-time access is available to all of the papers on the list.
Citation: F.L. Bookstein, "Principal Warps: Thin-Plate Splines and the Decomposition of Deformations," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 6, pp. 567-585, June 1989, doi:10.1109/34.24792
I'd like to say I learned about Thin-Plate Splines straight from the papers by Duchon or Meinguet, but I didn't. In fact, I found out about them from this excellent paper by Fred Bookstein. I remember very well punching in the coefficients of the numerical example in that paper into Matlab and realizing how helpful this approach would be to my work on shape matching.
Citation: W.T. Freeman, E.H. Adelson, "The Design and Use of Steerable Filters," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 9, pp. 891-906, Sept. 1991, doi:10.1109/34.93808
This is the first TPAMI paper I ever read, and it is also the reason I chose to make computer vision my career. I was hooked from their very first example of steered first derivatives of Gaussians. I subsequently devoted several years of my life to studying low level feature extraction, including a pilgrimage to the Mecca of image filtering in Linköping, Sweden.
Citation: Richard I. Hartley, "In Defense of the Eight-Point Algorithm," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 6, pp. 580-593, June 1997, doi:10.1109/34.601246
"It's the normalization, stupid."
Citation: Jianbo Shi, Jitendra Malik, "Normalized Cuts and Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000, doi:10.1109/34.868688
This was one of the first TPAMI papers whose formation I witnessed from start to finish, since Jianbo was my officemate. We all knew they had a hit on their hands with this one. We also knew that with the publication of this paper, our honeymoon phase with spectral clustering was over, and the nitty gritty phase was about to begin.
5. Compact Representations of Videos Through Dominant and Multiple Motion Estimation by Harpreet S. Sawhney and Serge Ayer
Citation: Harpreet S. Sawhney, Serge Ayer, "Compact Representations of Videos Through Dominant and Multiple Motion Estimation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 814-830, August, 1996, doi:10.1109/34.531801
Who could forget this paper's dynamic mosaics made from footage of Arnold riding a Harley in Terminator 2. The things they were doing with optical flow at Sarnoff Research Center in the mid-90s were indistinguishable from magic.
Citation: John Canny, "A Computational Approach to Edge Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679-698, Nov. 1986, doi:10.1109/TPAMI.1986.4767851
Before the SVD mania of the 90s, and long before the boosting craze of the 00s, a handful of towering contributions in the areas of edge detection, optical flow and regularization theory were developed on the foundations of variational calculus. The Canny Edge Detector, developed in the early 80s, was one such contribution. 25 years later it is required learning in virtually every beginning course in computer vision. Not bad for a Master's Thesis!
7. Fast Approximate Energy Minimization via Graph Cuts by Yuri Boykov, Olga Veksler, and Ramin Zabih
Citation: Yuri Boykov, Olga Veksler, Ramin Zabih, "Fast Approximate Energy Minimization via Graph Cuts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222-1239, November, 2001, doi:10.1109/34.969114
For a few years there it seemed that every problem I was working on could be written down with a cost function that Yuri, Olga and Ramin's code could solve for me quickly and accurately.
8. Joint Induction of Shape Features and Tree Classifiers by Yali Amit, Donald Geman, and Kenneth Wilder
Citation: Yali Amit, Donald Geman, Kenneth Wilder, "Joint Induction of Shape Features and Tree Classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 11, pp. 1300-1305, Nov. 1997, doi:10.1109/34.632990
Quantized tags, approximate geometric arrangements and randomized trees. There were no SIFT or HoG features back then, and the binary handwritten digits were a far cry from the sheep and motorbikes of PASCAL and MSRC, but the essential constellation based recognition approach proposed by this paper was brilliant and ahead of its time.
9. Learning to Detect Objects in Images via a Sparse, Part-Based Representation by Shivani Agarwal, Aatif Awan, and Dan Roth
Citation: Shivani Agarwal, Aatif Awan, Dan Roth, "Learning to Detect Objects in Images via a Sparse, Part-Based Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1475-1490, Nov. 2004, doi:10.1109/TPAMI.2004.108
We still don't know what a "part" is, but that philosophical sticking point didn't stop this paper from making a big impact in object category detection.
Citation: S. Umeyama, "An Eigendecomposition Approach to Weighted Graph Matching Problems," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 5, pp. 695-703, Sept. 1988, doi:10.1109/34.6778M
Spectral graph matching is the lesser known sibling of spectral clustering, but it is nonetheless filled with interesting theoretical nuggets, many of which I encountered for the first time in this paper. I fondly remember this as the paper that prompted me to check out a copy of Papadimitriou and Stieglitz to find out about this so called "Hungarian Algorithm."
Tuesday, May 11, 2010
My collaborators (Vicky and Dim) are working on new video summarization project based on multimodal data and fuzzy classifiers. The proposed technique automatically generates summaries from on-line videos (YouTube). Each frame may participate in one or more than one of the generated classes. The application, once more, will be open source.
Here is a screenshot. More details as well as the paper will be added soon.
October 24-28, 2010
Submission of Papers: June 15, 2010
The International Conference on Signal Processing (ICSP), sponsored by the IEEE Beijing Section, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied signal processing. ICSP 2010 will bring together leading engineers and scientists in signal processing from around the world. Research frontiers in fields ranging from traditional signal processing applications to evolving multimedia and video technologies are regularly advanced by results first reported in ICSP technical sessions.
Topics include, but are not limited to:
A. Digital Signal Processing (DSP)
B. Spectrum Estimation & Modeling
C. TF Spectrum Analysis & Wavelet
D. Higher Order Spectral Analysis
E. Adaptive Filtering &SP
F. Array Signal Processing
G. Hardware Implementation for Signal Processing H. Speech and Audio Coding I. Speech Synthesis & Recognition J. Image Processing & Understanding K. PDE for Image Processing L. Video compression &Streaming M. Computer Vision & VR N. Multimedia & Human-computer Interaction O. Statistic Learning & Pattern Recognition P. AI & Neural Networks Q. Communication Signal processing R. SP for Internet and Wireless Communications S. Biometrics & Authentification T. SP for Bio-medical & Cognitive Science U. SP for Bio-informatics V. Signal Processing for Security W. Radar Signal Processing X. Sonar Signal Processing and Localization Y. SP for Sensor Networks Z. Application & Others
Under the support of numerous reviewers and authors, ICSP has been holded for 20 years. In this session, as a celebration for ICSP, we will hold celebration events and awards, which include Outstanding Paper Award, Outstanding Student Paper Award, etc. For details, please visit http://icsp10.bjtu.edu.cn .
The proceedings with Catalog number of IEEE and Library of Congress will be published prior to the conference in both hardcopy and CD-ROM, and distributed to all registered participants at the conference. The proceedings will be indexed by EI.
Prospective authors are invited to submit full-length, four-page, double-column papers, including figures and references, to the ICSP Technical Committee by June 15, 2010 at http://icsp10.bjtu.edu.cn. For questions about paper submission, please contact the technical program secretaries, Ms. TANG Xiaofang and Dr. AN Gaoyun at email@example.com and firstname.lastname@example.org .
For more information, please visit the ICSP 2010 web site at:
Monday, May 10, 2010
Videos presenting sequences of unwrapped omnidirectional images taken from the COLD database can be downloaded here, here and here.
Download, installation and usage instructions for both Linux and Windows can be found below. If you have any questions, you experience problems with the software or you have spotted a bug, please contact Andrzej Pronobis.
Download and Installation
The application is known to compile in both Linux and Windows and depends on the OpenCV library. The source code can be downloaded either as a tar.gz file (for Linux users) or zip file (for Windows users):
• Tar/gzip file (443.46 kB)
• Zip file (446.30 kB)
Binaries for both operating systems are also available:
• Linux binary (449.10 kB)
• Windows binary (584.34 kB)
CMake is used as a build system for the sources. Windows users can install MinGW to get a C++ compiler. To build from the sources, use either the 'build.sh' or 'build.bat' script.
- Estimating the center of the image using the thresholding-based method. 5 pixel boundary on top and bottom of the input image is excluded:
unwrap -tc 30 -eb 0 5 0 5 -fi filtered.jpg 40 230 input.jpg
- Estimating the center of the image using the edge-based method. Two "debug" images are created:
unwrap -ec 230 240 330 240 20 20 -fi filtered.jpg -ti test.jpg 40 230 input.jpg
- Unwrapping using fixed center and no interpolation or scaling:
unwrap -uw output.jpg -fc 330 240 40 230 input.jpg
- Unwrapping using automatic center point detection and bicubic interpolation:
unwrap -uw output.jpg -ec 230 240 330 240 20 20 -bc 40 230 input.jpg
- As above, but with vertical resolution of the output increased 2x:
unwrap -uw out.jpg -ec 230 240 330 240 20 20 -bc -sy 2 40 230 in.jpg
Wednesday, May 5, 2010
1.New Shape Descriptor (CSPD)
2.SpCD bug fixed
3.MPEG-7 Descriptors Fusion
-Download Empirical (Historical) Files From the WEB.
-Using Borda Count
-Using Linear Sum
4.MPEG-7 and CCD Descriptors Fusion
-Download Empirical (Historical) Files From the WEB.
-Using Borda Count
-Using Linear Sum
5.From now on we are using a Compact Version of the BTDH for indexing and retrieval
6.New descriptor (B-CEDD). During the search process, an image query is entered and the system returns images with a similar content. Initially, the similarity/distance between the query and each image in the database is calculated with the B-CEDD descriptor, and only if the distance is smaller than a predefined threshold,the comparison of their CEDDs is performed.
7.Now you can save the retrieval results in trec_eval format
8.Indexing is now working with *.bmp,*.jpg and *.png