Physically Unconstrained
Gaze Estimation in the Wild

Petr Kellnhofer*, Adrià Recasens*, Simon Stent, Wojciech Matusik, Antonio Torralba

Massachusetts Institute of Technology
Toyota Research Institute

Understanding where people are looking is an informative social cue. In this work, we present Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images. Our dataset consists of 238 subjects in indoor and outdoor environments with labelled 3D gaze across a wide range of head poses and distances. It is the largest publicly available dataset of its kind by both subject and variety, made possible by a simple and efficient collection method. Our proposed 3D gaze model extends existing models to include temporal information and to directly output an estimate of gaze uncertainty. We demonstrate the benefits of our model via an ablation study, and show its generalization performance via a cross-dataset evaluation against other recent gaze benchmark datasets. We furthermore propose a simple self-supervised approach to improve cross-dataset domain adaptation. Finally, we demonstrate an application of our model for estimating customer attention in a supermarket setting.

Download our paper

Please cite the following paper if you use our data, models or code:
Gaze360: Physically Unconstrained Gaze Estimation in the Wild
P. Kellnhofer*, A. Recasens*, S. Stent, W. Matusik and A. Torralba
IEEE International Conference on Computer Vision (ICCV), 2019

PDF Bibtex



Toyota Research Institute provided funds to assist the authors with their research but this article solely reflects the opinions and conclusions of its authors and not TRI or any other Toyota entity.

We acknowledge NVIDIA Corporation for hardware donations.

Related work
Aditya Khosla*, Kyle Krafka*, Petr Kellnhofer, Harini Kannan, Suchi Bhandarkar, Wojciech Matusik, Antonio Torralba
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 26th June - 1st July 2016.
PDF   Project Page

Eye Tracking for Everyone (aka iTracker) consists of a dataset of 1450 people obtained using iPhones (called GazeCapture) and DNN model for gaze prediction (called iTracker). Unlike Gaze360, the GazeCapture dataset is specific to hand-held devices, mostly indoor environments, front-facing camera views and it only features 2D gaze annotations. The advandatage is a larger number of subjects (1450 vs. 238) and higher average face image resolution. Where Gaze360 aims for all-purpose eye tracker for any conditions, GazeCapture may still be useful for specialized applications.