Image triage is a common task in digital photography. Determining which photos are worth processing for sharing with friends and family and which should be deleted to make room for new ones can be a challenge, especially on a device with a small screen like a mobile phone or camera. In this work we explore the importance of local structure changes — e.g. human pose, appearance changes, object orientation, etc. — to the photographic triage task. We perform a user study in which subjects are asked to mark regions of image pairs most useful in making triage decisions. From this data, we train a model for image saliency in the context of other images that we call cosaliency. This allows us to create collection-aware crops that can augment the information provided by existing thumbnailing techniques for the image triage task.
In the illustration below, a pair of thumbnail images are shown (two leftmost images). On very small screens like those on the back of a camera or mobile phone, it's difficult or impossible to see the differences between the images. However, using a computational model of cosaliency (third from left), we can automatically detect areas of visual difference that are salient to human attention. Further processing produces the highlight map (fourth from left), allowing us to intelligently crop the image to the area of interest (last two images). Now we can see the difference: in the rightmost image, the woman is pointing at something.