I saw this job posting from EyeEm, a photo sharing app / service, in which they express their wish/plan to build a search engine that can ‘identify and understand beautiful photographs’. That got me thinking about how I would approach building a system like that.
Here is how I would start:
1. Define what you are looking for
EyeEm already has a search engine based on tags and geo-location. So I assume, they want to prevent low quality pictures to appear in the results and add missing tags to pictures, based on the image’s content. One could also group similar looking pictures or rank those pictures lower which “don’t contain their tags”. For instance for the Brandenburger Torthere are a lot of similar looking pictures and even some that don’t contain the gate at all.
But for which concepts should one train the algo-rithms? Modern image retrieval systems are trained for hundreds of concepts, but I don’t think it is wise to start with that many. Even the most sophisticated, fine tuned systems have high error rates for most of the concepts as can be seen in this year’s results of the Large Scale Visual Recognition Challenge.
For instance the team from EUVision / University of Amsterdam, placed 6 in the classification challenge, only selected 16 categories for their consumer app Impala. For a consumer application I think their tags are a good choice:
- Cats (sorry, no dogs)
- Party life
- Sunsets and sunrises
But of course EyeEm has the luxury of looking at their log files to find out what their users are actually searching for.
And on a comparable task of classifying pictures into 15 scene categories a team from MIT under Antonio Torralba showed that even with established algorithms one can achieve nearly 90% accuracy [Xiao10]. So I think it’s a good idea to start with a limited number of standard and EyeEm specific concepts, which allows for usable recognition accuracy even with less sophisticated approaches.
But what about identifying beautiful photographs? I think in image retrieval there is no other concept which is more desirable and challenging to master. What does beautiful actually mean? What features make a picture beautiful? How do you quantify these features? Is beautiful even a sensibly concept for image retrieval? Might it be more useful trying to predict which pictures will be `liked` or `hearted` a lot? These questions have to be answered before one can even start experimenting. I think for now it is wise to start with just filtering out low quality pictures and to try to predict what factors make a picture popular.