The CoPhIR (Content-based Photo Image Retrieval) Test-Collection has been developed to make significant tests on the scalability of the SAPIR project infrastructure (SAPIR: Search In Audio Visual Content Using Peer-to-peer IR) for similarity search.
We are extracting metadata from the Flickr archive, using the EGEE European GRID, through the DILIGENT project.
For each image, the standard MPEG7 image feature have been extracted. Each entry of the test-bed contains:
- The link to the corresponding entry into Flickr Web site
- The photo image thumbnail
- An XML structure with the Flickr user information in the corresponding Flickr entry: title, location, GPS, tags, comments, etc.
- An XML structure with 5 extracted standard MPEG7 image features:
- Scalable Colour
- Colour Structure
- Colour Layout
- Edge Histogram
- Homogeneous Texture
The data collected so far represents the world largest multimedia metadata collection that would be available for research on scalable similarity search techniques