Abstract
Gaze estimation requires balancing accuracy and efficiency for real-world deployment. We introduce LightGazeNet, a lightweight Graph Neural Network (GNN) framework that integrates multi-modal inputs—facial features, eye cues, 3D eye centers, head pose, and calibration data— within a compact graph-based architecture. Using multi-head attention for context-aware fusion, LightGazeNet achieves competitive or superior accuracy with significantly fewer parameters and strong cross-dataset generalization.
What’s new
- Graph modeling of heterogeneous gaze cues (appearance + geometry) for explicit relational reasoning.
- Multi-head attention GNN to adaptively weight modalities and improve interpretability.
- Lightweight design for practical deployment on resource-constrained devices.
Method
LightGazeNet builds a fully connected graph of modality nodes (left/right eye, face, head rotation, left/right 3D eye position), projects all modalities into a shared embedding, and performs two-layer multi-head attention graph reasoning.
- Feature encoding & projection: lightweight MobileNetV3-Small for images; linear layers for geometric inputs.
- Graph construction: 6-node fully connected graph where edges represent inter-modal dependencies.
- GNN reasoning: multi-head attention updates node features; flattened graph embedding feeds regression head.
- Calibration: subject-specific embedding for efficient personalization (few-shot calibration).
Inputs
- Face image (normalized crop)
- Left eye crop, Right eye crop
- Head rotation vector (9D)
- 3D eye center positions (left/right)
- Calibration embedding (optional)
Output
Regress pitch & yaw; optionally reconstruct 3D gaze direction via spherical-to-cartesian transform.
Results
LightGazeNet is designed for strong accuracy–efficiency trade-offs and robust generalization. Below are key headline numbers from the paper.
MPIIFaceGaze
3.06°
Mean angular error (leave-one-subject-out). Calibration further improves performance.
EyeDiap
2.91°
Mean angular error under standard evaluation protocol.
GazeCapture
1.69 cm
Overall distance error across devices (phone+tablet).
Calibration (MPIIFaceGaze)
| Calibration samples (k) | Angular error (°) | Improvement |
|---|---|---|
| Uncalibrated | 3.39 | — |
| 1 | 3.28 | 3.24% |
| 9 | 3.15 | 7.08% |
| 16 | 3.06 | 9.73% |
| 32 | 2.99 | 11.80% |
Read the paper
If GitHub Pages blocks embedded PDFs in some browsers, use the “View PDF” button above.
Citation
@InProceedings{Patel_2026_WACV,
author = {Patel, Heena and Chowdhury, Anirban and Choksy, Pooja Jigar and Pachade, Samiksha Pradeep and Puar, Ajinkya},
title = {LightGazeNet: A Lightweight GNN-based Architecture for Gaze Estimation},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2026},
pages = {3710-3719}
}
Contact
Questions, collaborations, or requests:
Email: eyelignai@akesoeyecare.com
Affiliation: Akeso Eyecare, Beijing, China