Thursday, July 2, 2009

Konstantinos Zagoris PhD Thesis

My colleague Konstantinos Zagoris presented his PhD Thesis.

ABSTRACT
In  the  last  years,  the  world  has  experienced  a  significant  growth  of  the  size  of multimedia data without any indexing information, which have been increased thanks to the easiness  to  create  such  images using  scanners or digital  cameras.  In order  to satisfactorily exploit these quantities of images, it is necessary to develop effective techniques to browse, store and retrieve them.  The present PhD Thesis introduces five methods that improve the content-based image retrieval systems.

6333_1164179218360_1044255415_30473766_4052057_n

The  first  technique  proposes  a  new  color  clustering  technique which  is  based  on  a combination of a neural network and a  fuzzy  classifier.  Initially,  the  colors are  reduced by using the  Kohonen  Self  Organized  Feature Map  (KSOFM).  After  this,  each  initial  color is classified to one of the output KSOFM classes. In the final stage, the KSOFM results initialize the Gustafson – Kessel  Fuzzy Classifier  (GTFC). The  final  clustering  results obtained by  the GTFC are the color palette of the final image.  
The  experimental  results  have  shown  the  ability  to  retain  the  image’s  dominant colors.  Also,  it  can merge  areas  of  the  image with  similar  colors  producing  uniform  color areas. In this point of view the proposed technique can be used for color segmentation. The second method introduces a relevance feedback technique based on four MPEG-7-  like descriptors.   The user searching  for a subset of  images, sometimes has not a clearly and accurate vision of these  images. He/she has a general notion of the  image  in quest but not the exact visual depiction of it. Also, sometimes there is not an appropriate query image to use for retrieval.  So,  the  system must  provide  a mechanism  to  fine  tune  the  retrieval results.
Primarily,  the  initial  image  query  one-dimensional  descriptor  is  transformed  to  a three-dimensional vector based on the inner features of the descriptor which stores the user history  search  information  and  it  is initialized  by  the  original  query  descriptor. When the user  selects  a  relevant  image  from  the  retrieval results,  each  bin of  that  selected  image's descriptor  updates  the  corresponding  value  of  the three-dimensional  vector.  The  final descriptor  to  query  the  image  database  is  formed  by  the  values  of  the  three-dimension vector and the new results are presented to the user. The proposed relevance feedback technique improves the original retrieval results,  it is simple to implement and has low computational cost. 
The  third  method  detects  and  extracts  homogeneous  text  in  document  images indifferent  to  font  types  and  size  by  using  connected  components  analysis  to  detect  the objects, Document  Structure Elements  (DSE)  to construct  a descriptor  and  Support Vector Machines  to  tag  the  appropriate  objects  as  text.  Also,  it  has  the  ability  to  adapt  to  the peculiarities of each document images database since the features adjust to it. Primarily, the connected components detect and extract the object blocks that reside inside  the  image.  From  every  such  block  a  descriptor  is  extracted which  it  is  constructed from  a  set  of  document  structures elements.    Also,  the  length  of  the  descriptor  can  be reduced from the 510 initial DSEs to any number using an algorithm called Feature Standard Deviation Analysis of Structure Elements  (FSDASE).  Finally,  the output of  the  SVM  is using the descriptors to classify each block as text or not and extract those blocks from the original image or locate them on it.
The proposed technique has the ability to adapt to the peculiarities of each document images database  since  the  features adjust  to  it.  It provides, also,  the ability  to  increase or decrease text localization speed by the manipulation of the block descriptor length.  
The  fourth  technique  encounters  the  document  retrieval  problem  using  a  word matching procedure. This  technique performs  the word matching directly  in  the document images bypassing OCR and using word-images as queries. The entire system consists of the Offline and the Online procedures.
In the Offline procedure which it is transparent to the user, the document images are analyzed  and  the  results  are  stored  in  a  database.  This  procedure  consists  of  three main stages.  Initially,  the  document  images  pass  the  preprocessing  stage  which  consists  of  a Median  filter,  in  order  to  face  the  existence  of  noise  e.g  in  case  of  historical  or  badly maintained  documents,  and  the Otsu  binarization method.  The word  segmentation  stage follows  the  preprocessing  stage.  Its  primary  goal  is  to  detect  the  word  limits.  This  is accomplished by using the Connected Components Labeling and Filtering method.   A set of features, capable of capturing the word shape and discard detailed differences due to noise or  font differences are used  for  the word-matching process. These  features are: Width to Height Ratio, Word Area Density, Center of Gravity, Vertical Projection, Top – Bottom Shape
Projections, Upper Grid  Features, Down Grid  Features.  Finally,  these  features  create a 93-dimention vector that is the word descriptor and it is stored in a database.  In the Online  procedure,  the  user  enters  a  query  word  and  the  proposed  system creates an image from it with font height equal to the average height of all the word-boxes
obtained through Offline operation. Then, the system calculates the descriptor of the query word  image.  Finally,  the  system  using the Minkowski  L1  distance  presents  the  documents that  contain  the  words  which  their  descriptors  are  closest  to  the  query  descriptor.  The experimental  results  show  that  the  proposed  system  performs  better  than  a  commercial OCR package.
The  last  method  involves  a  MPEG-like  compact  shape  descriptor  that  contains conventional contour and region shape features with a wide applicability from any arbitrary shape  to  document  retrieval  through word  spotting.  It  is  called  Compact  Shape  Portrayal Descriptor  and  its  computation  can  be  easily  parallize  as  each  feature  can  be  calculated separately. These  features are  the Width  to Height Ratio, Vertical – Horizontal Projections,
Top – Bottom Shape Projections which construct a 41 dimension descriptor. 
In order to compress the descriptor even more, the values of the feature vectors are quantized for binary representation  in three bits for each element of the descriptor. So the storage  requirement  is  equal  to  3x41=123  bits.  The  values  of  the  descriptor  are concentrated  within  small  ranges  so  they  must  be  non-linearly  quantized  in order to minimize the overall number of bits. Also, each feature  is not related to each other so they must  have  differing  quantization  values.  Finally,  the  MPEG-7  quantizes  its  compact descriptors, too. The quantization  is achieved by  the Gustafson-Kessel Fuzzy Classifier  (GKFC) which  it produces eight clusters defined by a center and a positive-define matrix adapted according to the topological structure of the data  inside the cluster. So, the output of GKFC maps the descriptor values for the decimal area [0,1] into the integer area [0,7] or into the binary area
[000,111]. In  addition  to  the  descriptor,  a  Relevance  Feedback  technique  is  provided  that employs the above descriptor with the purpose to measure how well it performs with it. It is based  on  the  Support  Vector  Machines  (SVMs).  When  the  system  presents  the  initial retrieval results to the user, he/she  is able to tag one or more  images as wrongly or rightly retrieved.  The  system  utilizes  this  information  by  grouping  the  descriptor  of  those word-images  (including the original query descriptor) as training data for the SVMs. Then, all the words-images  are  presented  to  the  user  with  respect  to  the  normalized  SVMs  decision function.
The Compact Shape Portrayal Descriptor main advantages are the very small size (only 123bits);  its  low  computation  cost  and  its  general  applicability  without  compromise its retrieval accuracy.   In  the  bottom  line,  the  present  thesis  presents  solutions  to  real  problems  of  the content-based  image  retrieval  systems as  image  segmentation,  text  localization,  relevance feedback  algorithms  and  shape/word  descriptors.  All  the  proposed  methods  can  be combined in order to create a fast and modern MPEG-7 compatible content-based retrieval image system. 

Download the Thesis (In Greek)

Congratulations Konstantinos

No comments: