This paper presents an approach to unsupervised, content based classification and lookup of multi-dimensional objects such as pictures, audio and video, which could be used to accelerate online searches of such data sets and real-time biometrical recognition and classification which uses them. The approach treats the classification and lookup as a generic pattern matching problem. On contrary to the traditional correlation oriented method, this approach adopts Principle Component Analysis (PCA) to create a representation which will be used as an index to a particular data set. This results in a fewer dimension identity of the object, that can be looked up in shorter time, provided the approach always consider the first n components for matching. Further the outcome of PCA will be classified through dendrogram clusterization that will result in a system similar to a hash table. The framework provides hash function like system to speedup lookups and recognitions. The article introduces to the problem of classification of non-textual data and inherent limitations. It further describes the advantage of using PCA as an alternative to pattern matching and leads through the methodology in developing dendrogram clusterization based online management and lookup.
Autor: Gartheeban Ganeshapillai
Download the paper