AI Applications in the fields of Multimedia, Computer Vision and Robotics: Konstantinos Zagoris PhD Thesis

My colleague Konstantinos Zagoris presented his PhD Thesis.

ABSTRACT
In the last years, the world has experienced a significant growth of the size of multimedia data without any indexing information, which have been increased thanks to the easiness to create such images using scanners or digital cameras. In order to satisfactorily exploit these quantities of images, it is necessary to develop effective techniques to browse, store and retrieve them. The present PhD Thesis introduces five methods that improve the content-based image retrieval systems.

The first technique proposes a new color clustering technique which is based on a combination of a neural network and a fuzzy classifier. Initially, the colors are reduced by using the Kohonen Self Organized Feature Map (KSOFM). After this, each initial color is classified to one of the output KSOFM classes. In the final stage, the KSOFM results initialize the Gustafson – Kessel Fuzzy Classifier (GTFC). The final clustering results obtained by the GTFC are the color palette of the final image.
The experimental results have shown the ability to retain the image’s dominant colors. Also, it can merge areas of the image with similar colors producing uniform color areas. In this point of view the proposed technique can be used for color segmentation. The second method introduces a relevance feedback technique based on four MPEG-7- like descriptors.   The user searching for a subset of images, sometimes has not a clearly and accurate vision of these images. He/she has a general notion of the image in quest but not the exact visual depiction of it. Also, sometimes there is not an appropriate query image to use for retrieval. So, the system must provide a mechanism to fine tune the retrieval results.
Primarily, the initial image query one-dimensional descriptor is transformed to a three-dimensional vector based on the inner features of the descriptor which stores the user history search information and it is initialized by the original query descriptor. When the user selects a relevant image from the retrieval results, each bin of that selected image's descriptor updates the corresponding value of the three-dimensional vector. The final descriptor to query the image database is formed by the values of the three-dimension vector and the new results are presented to the user. The proposed relevance feedback technique improves the original retrieval results, it is simple to implement and has low computational cost.
The third method detects and extracts homogeneous text in document images indifferent to font types and size by using connected components analysis to detect the objects, Document Structure Elements (DSE) to construct a descriptor and Support Vector Machines to tag the appropriate objects as text. Also, it has the ability to adapt to the peculiarities of each document images database since the features adjust to it. Primarily, the connected components detect and extract the object blocks that reside inside the image. From every such block a descriptor is extracted which it is constructed from a set of document structures elements.    Also, the length of the descriptor can be reduced from the 510 initial DSEs to any number using an algorithm called Feature Standard Deviation Analysis of Structure Elements (FSDASE). Finally, the output of the SVM is using the descriptors to classify each block as text or not and extract those blocks from the original image or locate them on it.
The proposed technique has the ability to adapt to the peculiarities of each document images database since the features adjust to it. It provides, also, the ability to increase or decrease text localization speed by the manipulation of the block descriptor length.
The fourth technique encounters the document retrieval problem using a word matching procedure. This technique performs the word matching directly in the document images bypassing OCR and using word-images as queries. The entire system consists of the Offline and the Online procedures.
In the Offline procedure which it is transparent to the user, the document images are analyzed and the results are stored in a database. This procedure consists of three main stages. Initially, the document images pass the preprocessing stage which consists of a Median filter, in order to face the existence of noise e.g in case of historical or badly maintained documents, and the Otsu binarization method. The word segmentation stage follows the preprocessing stage. Its primary goal is to detect the word limits. This is accomplished by using the Connected Components Labeling and Filtering method.   A set of features, capable of capturing the word shape and discard detailed differences due to noise or font differences are used for the word-matching process. These features are: Width to Height Ratio, Word Area Density, Center of Gravity, Vertical Projection, Top – Bottom Shape
Projections, Upper Grid Features, Down Grid Features. Finally, these features create a 93-dimention vector that is the word descriptor and it is stored in a database. In the Online procedure, the user enters a query word and the proposed system creates an image from it with font height equal to the average height of all the word-boxes
obtained through Offline operation. Then, the system calculates the descriptor of the query word image. Finally, the system using the Minkowski L1 distance presents the documents that contain the words which their descriptors are closest to the query descriptor. The experimental results show that the proposed system performs better than a commercial OCR package.
The last method involves a MPEG-like compact shape descriptor that contains conventional contour and region shape features with a wide applicability from any arbitrary shape to document retrieval through word spotting. It is called Compact Shape Portrayal Descriptor and its computation can be easily parallize as each feature can be calculated separately. These features are the Width to Height Ratio, Vertical – Horizontal Projections,
Top – Bottom Shape Projections which construct a 41 dimension descriptor.
In order to compress the descriptor even more, the values of the feature vectors are quantized for binary representation in three bits for each element of the descriptor. So the storage requirement is equal to 3x41=123 bits. The values of the descriptor are concentrated within small ranges so they must be non-linearly quantized in order to minimize the overall number of bits. Also, each feature is not related to each other so they must have differing quantization values. Finally, the MPEG-7 quantizes its compact descriptors, too. The quantization is achieved by the Gustafson-Kessel Fuzzy Classifier (GKFC) which it produces eight clusters defined by a center and a positive-define matrix adapted according to the topological structure of the data inside the cluster. So, the output of GKFC maps the descriptor values for the decimal area [0,1] into the integer area [0,7] or into the binary area
[000,111]. In addition to the descriptor, a Relevance Feedback technique is provided that employs the above descriptor with the purpose to measure how well it performs with it. It is based on the Support Vector Machines (SVMs). When the system presents the initial retrieval results to the user, he/she is able to tag one or more images as wrongly or rightly retrieved. The system utilizes this information by grouping the descriptor of those word-images (including the original query descriptor) as training data for the SVMs. Then, all the words-images are presented to the user with respect to the normalized SVMs decision function.
The Compact Shape Portrayal Descriptor main advantages are the very small size (only 123bits); its low computation cost and its general applicability without compromise its retrieval accuracy.   In the bottom line, the present thesis presents solutions to real problems of the content-based image retrieval systems as image segmentation, text localization, relevance feedback algorithms and shape/word descriptors. All the proposed methods can be combined in order to create a fast and modern MPEG-7 compatible content-based retrieval image system.

Download the Thesis (In Greek)

Congratulations Konstantinos

Pages

Thursday, July 2, 2009

Konstantinos Zagoris PhD Thesis

No comments: