Abstract

Gaze estimation requires balancing accuracy and efficiency for real-world deployment. We introduce LightGazeNet, a lightweight Graph Neural Network (GNN) framework that integrates multi-modal inputs—facial features, eye cues, 3D eye centers, head pose, and calibration data— within a compact graph-based architecture. Using multi-head attention for context-aware fusion, LightGazeNet achieves competitive or superior accuracy with significantly fewer parameters and strong cross-dataset generalization.

What’s new

  • Graph modeling of heterogeneous gaze cues (appearance + geometry) for explicit relational reasoning.
  • Multi-head attention GNN to adaptively weight modalities and improve interpretability.
  • Lightweight design for practical deployment on resource-constrained devices.

Method

LightGazeNet builds a fully connected graph of modality nodes (left/right eye, face, head rotation, left/right 3D eye position), projects all modalities into a shared embedding, and performs two-layer multi-head attention graph reasoning.

  1. Feature encoding & projection: lightweight MobileNetV3-Small for images; linear layers for geometric inputs.
  2. Graph construction: 6-node fully connected graph where edges represent inter-modal dependencies.
  3. GNN reasoning: multi-head attention updates node features; flattened graph embedding feeds regression head.
  4. Calibration: subject-specific embedding for efficient personalization (few-shot calibration).

Inputs

  • Face image (normalized crop)
  • Left eye crop, Right eye crop
  • Head rotation vector (9D)
  • 3D eye center positions (left/right)
  • Calibration embedding (optional)

Output

Regress pitch & yaw; optionally reconstruct 3D gaze direction via spherical-to-cartesian transform.

Results

LightGazeNet is designed for strong accuracy–efficiency trade-offs and robust generalization. Below are key headline numbers from the paper.

MPIIFaceGaze

3.06°

Mean angular error (leave-one-subject-out). Calibration further improves performance.

EyeDiap

2.91°

Mean angular error under standard evaluation protocol.

GazeCapture

1.69 cm

Overall distance error across devices (phone+tablet).

Calibration (MPIIFaceGaze)

Calibration samples (k) Angular error (°) Improvement
Uncalibrated 3.39
1 3.28 3.24%
9 3.15 7.08%
16 3.06 9.73%
32 2.99 11.80%

Read the paper

If GitHub Pages blocks embedded PDFs in some browsers, use the “View PDF” button above.

Citation



@InProceedings{Patel_2026_WACV,
    author    = {Patel, Heena and Chowdhury, Anirban and Choksy, Pooja Jigar and Pachade, Samiksha Pradeep and Puar, Ajinkya},
    title     = {LightGazeNet: A Lightweight GNN-based Architecture for Gaze Estimation},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {March},
    year      = {2026},
    pages     = {3710-3719}
}
            

Contact

Questions, collaborations, or requests:

Email: eyelignai@akesoeyecare.com

Affiliation: Akeso Eyecare, Beijing, China