Robotics · Egocentric video
Egocentric video annotation for humanoid robots.
First-person video is the single most important modality for humanoid training. It is also the modality most gig platforms cannot deliver consistently. Our specialists have been holding the taxonomy for multi-quarter programmes.
What we deliver
The data layer behind the demo video.
Grasp segmentation & action labelling
Per-frame segmentation of hands, gripped objects, and action boundaries. Calibrated rubrics for partial grasps, re-grasps, and object hand-offs. Consistent across multi-session training runs.
Scene parsing & affordance labelling
Object classification, affordance mapping, and navigable-space parsing from the first-person perspective. Built for both whole-scene understanding and object-specific interaction.
Action recognition & temporal segmentation
Temporally-grounded action labels with clean start/end boundaries. Disagreement-aware sampling for ambiguous transitions. Suitable for action-recognition models and VLA training.
Safety-critical flag review
Senior-reviewer tier for safety-critical scene classifications. On-call coverage available for deployed systems.
Platform-agnostic by default.
Encord. Labelbox. V7. Scale AI. Roboflow. Internal tooling. We deliver specialists on whichever platform your team runs — including the ones built specifically for robotics data.