IndiVillage
HomeResourcesBlogRobotics
Robotics

LiDAR annotation vs 3D point cloud: when each wins

LiDAR and RGB-D point clouds require different annotation strategies. Learn when to use cuboid-first vs. segmentation-first approaches.
Author · Mark Pinnes
·
19 April 2026
·
8 min
IndiVillage robotics specialist at workstation
IndiVillage Robotics · Bengaluru
L

iDAR (light detection and ranging) and RGB-D depth sensors both produce 3D point clouds, but they require fundamentally different annotation strategies. Conflating them will either bloat your labelling budget or degrade model performance.

The sensor difference and why it matters for annotation

LiDAR emits infrared pulses and measures reflections, producing sparse point clouds (10–100K points per frame, depending on resolution). RGB-D sensors (RealSense, Kinect) project a structured light pattern, producing denser clouds (300K–600K points). Sparse LiDAR points cluster around object edges and surfaces; dense RGB-D clouds often contain noise and interior points.

For annotation, this changes everything. A sparse LiDAR point cloud makes object boundaries crisp; a dense RGB-D cloud makes it hard to isolate objects without false positives. Conversely, dense clouds preserve fine geometric detail that sparse clouds discard.

Cuboid-first annotation for autonomous vehicles

If your robotics application is autonomous navigation or vehicle perception — think mobile robots or autonomous delivery systems — cuboid-first annotation wins. Here's why:

Bounding box efficiency: 3D cuboids (axis-aligned or oriented) are fast to annotate — 30–60 seconds per object. For a LiDAR point cloud with 5–10 objects per scene, this is 5–10 minutes per frame.

Object-centric labels: You care about "vehicle," "pedestrian," "cyclist" at the frame level. Cuboids are sufficient; you don't need pixel-level segmentation.

Dataset scale: Autonomous vehicle datasets contain tens of thousands of frames. Cuboid annotation is the only way to reach that scale cost-effectively.

Model compatibility: Object detection and tracking models (YOLO, PointNet) ingest cuboid labels directly. No conversion pipeline needed.

Use cuboid-first for: mobile robot navigation, delivery vehicle perception, any task where object identity and rough location matter more than precise geometry.

Segmentation-first for manipulation and scene understanding

If your robot manipulates objects or needs to understand fine scene structure — industrial picking, assembly, humanoid manipulation — semantic or instance segmentation wins. Here's why:

Geometry precision: Segmentation captures object shape at the point level. When a robot arm must navigate around clutter or grasp a specific part of an object, precise geometry matters.

Contact prediction: Knowing which points contact which object during a manipulation task requires instance segmentation. Cuboids smear object boundaries.

Occlusion handling: In cluttered scenes, partially occluded objects need pixel-level segmentation to determine boundaries. Cuboids fail when 70% of an object is hidden.

Post-processing accuracy: Semantic segmentation supports instance refinement — merge nearby points, remove noise, extract precise contact surfaces — which cuboids can't support.

Use segmentation-first for: bin picking, assembly, object rearrangement, any manipulation task where geometry precision directly impacts success.

Practical annotation workflows

Cuboid workflow:

  1. Load LiDAR point cloud as 3D render
  2. Place cuboid, adjust position/rotation/size
  3. Assign class label, object ID, occlusion state
  4. Review: does the cuboid enclose the object? If yes, done.

Time: ~45 seconds per object. Requires specialised 3D annotation tool (Labelbox 3D, Encord, Scale).

Segmentation workflow:

  1. Load LiDAR point cloud
  2. Use learned segmentation model to propose segments
  3. Annotator corrects: merge over-segmented regions, split under-segmented ones, remove noise
  4. Assign semantic label, object ID
  5. Extract geometry (convex hull, oriented bounding box) for downstream tasks

Time: 8–12 minutes per frame (1000 points). Requires point-cloud annotation tool with brush/eraser (CloudCompare, Supervisely, custom).

Hybrid approaches: when to blend strategies

For complex scenes, a hybrid often makes sense:

  • Coarse cuboids as anchors: Place cuboids first for rough object localization
  • Dense segmentation in regions of interest: Segment only objects the robot will interact with
  • Sparse point labels for contact: For specific objects, label grasp points and contact surfaces without full segmentation

This balances cost (cuboids are cheap) and precision (segmentation is precise where it matters).

Cost and timeline implications

Cuboid annotation is significantly cheaper per frame than segmentation, but the break-even point depends on dataset size and model iteration cycles. Smaller datasets may see lower relative cost with segmentation due to fixed overhead. The choice between speed (cuboids) and precision (segmentation) is ultimately a trade-off between iteration cycles and upfront annotation investment, not labour cost alone.

What this means for you

Choose your annotation strategy based on downstream task, not on sensor type. LiDAR with cuboids is standard for navigation; RGB-D with segmentation is standard for manipulation. If you're unsure, run a pilot: annotate 50 frames both ways, train a small model on each, and measure which gives better performance on your validation set.

Annotation cost is real but often not the bottleneck — iteration speed is. A slower, more precise segmentation pipeline beats a fast, noisy cuboid pipeline if it reduces model debugging cycles by two weeks.

Explore robotics data strategies or compare annotation platforms.

Work with us
Run a specialist audit.
100 frames. Your modality. Your accuracy target. Returns in 48 hours.
Run a specialist audit
Talk to a delivery lead →