IndiVillage

Robotics · Egocentric video

Egocentric video annotation for humanoid robots.

First-person video is the single most important modality for humanoid training. It is also the modality most gig platforms cannot deliver consistently. Our specialists have been holding the taxonomy for multi-quarter programmes.

98.7%
Accuracy standard
Multi-quarter
Specialist retention
First-person
Primary modality
Grasp · action · scene
Label types

What we deliver

The data layer behind the demo video.

Grasp segmentation & action labelling

Per-frame segmentation of hands, gripped objects, and action boundaries. Calibrated rubrics for partial grasps, re-grasps, and object hand-offs. Consistent across multi-session training runs.

Scene parsing & affordance labelling

Object classification, affordance mapping, and navigable-space parsing from the first-person perspective. Built for both whole-scene understanding and object-specific interaction.

Action recognition & temporal segmentation

Temporally-grounded action labels with clean start/end boundaries. Disagreement-aware sampling for ambiguous transitions. Suitable for action-recognition models and VLA training.

Safety-critical flag review

Senior-reviewer tier for safety-critical scene classifications. On-call coverage available for deployed systems.

Platform-agnostic by default.

Encord. Labelbox. V7. Scale AI. Roboflow. Internal tooling. We deliver specialists on whichever platform your team runs — including the ones built specifically for robotics data.