Article from http://www.semanticmetadata.net/
LIRE is not a sleeping beauty, so there’s something going on in the SVN. I recently checked in updates on Lucene (now 4.2) and Commons Math (now 3.1.1). Also I removed some deprecation things still left from Lucene 3.x.
Most notable addition however is the Extractor / Indexor class pair. They are command line applications that allow to extract global image features from images, put them into an intermediate data file and then — with the help of Indexor — write them to an index. All images are referenced relatively to the intermediate data file, so this approach can be used to preprocess a whole lot of images from different computers on a network file system. Extractor also uses a file list of images as input (one image per line) and can be therefore easily run in parallel. Just split your global file list to n smaller, non overlapping ones and run n Extractor instances. As the extraction part is the slow one, this should allow for a significant speed-up if used in parallel.
Extractor is run with
$> Extractor -i <infile> -o <outfile> -c <configfile>
- <infile> gives the images, one per line. Use “dir /s /b *.jpg > list.txt” to create a compatible list on Windows.
- <outfile> gives the location and name of the intermediate data file. Note: It has to be in a folder parent to all images!
- <configfile> gives the list of features as a Java Properties file. The supported features are listed below the post. The properties file looks like:
Indexor is run with
Indexor -i <input-file> -l <index-directory>
- <input-file> is the output file of Extractor, the intermediate data file.
- <index-directory> is the directory of the index the images will be added (appended, not overwritten)
Features supported by Extractor: