Knowing where people look is a useful tool in many various image and video applications. However, traditional gaze tracking hardware is expensive and requires local study participants, so acquiring gaze location data from a large number of participants is very problematic. We have developed a crowdsourced method for acquisition of gaze direction data from a virtually unlimited number of participants, using a robust self-reporting mechanism. Our system collects temporally sparse but spatially dense points-of-attention in any visual information, and we obtain results similar to traditional gaze tracking.

We also proposed a novel method for video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previous frame. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validated our method using two gaze-tracked video datasets and outperform the state-of-the-art in video saliency algorithms.