What's a Typical SLA for White-Label Annotation Services?
You've chosen a white-label partner. Now the real question: what should you actually write into the contract? SLAs are the bridge between expectation and reality—and the difference between a partnership that strengthens or strains your business.
The Stakes
Bad SLAs are invisible until they break. You're committed to a timeline to your customer. Your partner hits a resource crunch and misses delivery by two weeks. You miss your revenue milestone. Your customer threatens to leave. Alternatively: you negotiate SLAs that are achievable but vague ("best effort quality"). Your partner delivers, you deploy, and six months later your model performance drops unexpectedly because the later batches had silent quality drift.
Typical SLAs cover four core dimensions: throughput (volume and pace), accuracy (quality with measurement), turnaround (first-delivery timing), and escalation (what happens when something breaks). The mistake most platforms make is treating SLAs as negotiation theatre—written but not enforced. IndiVillage's track record: consistent SLA delivery across high-volume workloads (2.3M+ records) and 18 months of zero-drift quality on autonomous-robotics workloads.
The Four SLA Components
Throughput SLA. This is the volume commitment: "10,000 images per week," or "500,000 annotations per month," measured from submission to delivery. It's measured by calendar date—if you deliver Thursday, it counts; Friday is a miss. The penalty structure matters: throughput miss typically triggers one of three responses: (a) automatic cost credit (e.g., 2% of monthly fees per week missed), (b) volume carve-out on subsequent months (we owe you an extra 15,000 images next cycle), or (c) escalation protocol (if consecutive weeks missed, contract renegotiation). Don't over-commit here. IndiVillage commits only to volumes it has proven capacity for—built from actual team headcount and training ramp curves.
Throughput SLAs break down when vendors overestimate capacity. A vendor with 50 annotators can theoretically handle 50,000 images/week (at 200 images/person/week, a conservative industry rate for complex work). But if 10% of the team leaves, is sick, or on training, capacity drops to 45,000. A well-written SLA accounts for this variance. "10,000 images per week ±5% tolerance, with escalation if two consecutive weeks fall below 9,500 images."
Accuracy SLA. This is the quality commitment: "98% accuracy maintained across random-sampling validation." The methodology matters. How do you measure it? Re-audit by a separate team? Third-party validation? Gold-set comparison? The threshold is typically 98%+ for commodity tasks (classification, bounding box); 95%+ for complex tasks (semantic segmentation, multi-class with rare categories); 99%+ for safety-critical (medical, autonomous systems). Accuracy breach triggers investigation—is this a systematic error (annotation guidelines misunderstood)? Is it a specific image type (rare classes, ambiguous boundaries)? Or is it team fatigue (scaling too fast)? The investigation informs the corrective action: retraining, process adjustment, or escalation to senior annotators.
A concrete scenario: a crop-imaging vendor commits to 96% accuracy on weed classification across 10 crop types. Six weeks into the programme, accuracy drops to 93%. The vendor investigates and discovers that rare weed species (3% of the dataset) are being miscategorised—new annotators confuse them with more common species. The vendor retrains on rare classes, increases QC sampling on rare species from 5% to 15%, and achieves 96% accuracy in week 8. Without investigation-and-escalation language in the SLA, this would have been a silent degradation, and your downstream model would have inherited the error.
Turnaround SLA. First-delivery timing: "2-4 weeks from project kickoff to first batch delivery." This includes your onboarding, taxonomy documentation, QC protocol setup, and team ramp. Then ongoing cadence: weekly or monthly batches thereafter. Turnaround extends when scope changes (new modality, complexity shift, additional languages). Frontload this in the contract to avoid disputes. "Turnaround extends by one week per 500K additional images or per new visual category."
Ongoing cadence also matters. If you deliver 100K images on Monday and expect delivery Friday, that's 5-day turnaround. Some vendors batch work and deliver monthly. If your timeline requires weekly delivery, specify it in the SLA with penalties for variance. "Weekly delivery of annotated batches every Friday EOD ±1 day. Delivery >1 day late triggers 1% cost credit."
Compliance and availability SLA. Continuous availability (24/7 operations across geographies so a regional issue doesn't stop delivery); disaster recovery (work continues if one office goes offline); audit trails (every annotation logged with timestamp and annotator ID); data security (GDPR/HIPAA if applicable); breach notification (24-72 hours if there's a data incident). IndiVillage operates across 11 offices—if Bengaluru faces power loss, work routes to Hyderabad or Delhi. This redundancy is the operational guarantee behind the uptime claim.
Availability SLAs matter most when your workload is time-sensitive. A robotics company delivering autonomous systems by Q4 cannot afford a 2-week outage mid-August. A healthcare company deploying diagnostic systems by a regulatory deadline cannot lose annotation capacity. The SLA should specify: "99.5% uptime (no more than 4.5 hours per month unavailable). If unavailability exceeds 4.5 hours in a month, vendor provides 2.5% cost credit. Planned maintenance window (max 2 hours/month, 24-hour notice required) does not count toward unavailability threshold."
SLA Comparison Table
| SLA Component | Typical Target | Acceptable Range | Penalty for Miss | Notes |
|---|---|---|---|---|
| Throughput | 10,000 images/week | ±5% variance | 2% cost credit per week under target | Variance accounts for illness, training, turnover |
| Accuracy | 98% (commodity) | 95-99% by modality | Re-audit + retraining + escalation if systematic | Rare classes typically 5-10 points lower |
| First Delivery | 2-4 weeks post-kickoff | Modality-dependent | 1 week late = 5% cost credit | Includes onboarding, taxonomy, QC setup |
| Ongoing Turnaround | Weekly batches | 1-4 weeks by volume | Delivery >1 day late = 1% credit | Frequency set per contract |
| Uptime | 99.5% | 99%+ preferred | 2.5% credit per 4.5hr outage | Planned maintenance excluded |
| Drift Detection | <2% quarterly drift | <3% acceptable | Retraining triggered; contract review if repeated | Measured via gold-set re-annotation |
Red Flags in Weak SLAs
Watch for vendor language that signals weak commitment: "best effort," "approximately," "subject to team availability," "typical timeline." These are escape clauses. Specific beats vague. "98% accuracy verified by re-audit of 5% of batches" beats "high quality." "10,000 images per week, ±0 variance" beats "around 10K per week."
Penalty clauses matter. Some vendors write SLAs with no cost to them if they breach—only escalation checklists. That's administrative, not accountability. Real SLAs have cost. A missed throughput week means a 2% credit to your invoice. A quality miss means retraining and loss of contract if it repeats.
Other red flags to watch:
"Best effort" without metrics. If the SLA says "we will do our best to deliver 10K images per week," that's not an SLA—it's a statement of intent. Real SLAs commit to a number and specify what happens if you miss. "Best effort" is a loophole vendors use when they're unsure of capacity.
No measurement methodology. An accuracy SLA without specifying how accuracy is measured is useless. "The vendor will use proprietary methodology to measure accuracy" lets them decide what passes. Insist on: "Accuracy measured by re-audit of random 5% sample weekly. Methodology published upfront, approved by both parties, and does not change mid-contract."
Cascading penalties that don't hurt. "If throughput miss occurs, vendor will add 2 hours of free support." That's marketing, not penalty. Real penalties affect margin. "2% cost credit" hurts. "5% cost credit on subsequent month" hurts. Vendors take SLAs seriously when they cost money.
Exemptions so broad they swallow the rule. "SLA does not apply if client provides incorrect taxonomy, or if client changes project scope, or if there are network issues, or if team illness exceeds 10%, or..." By the time you list all exemptions, there's no SLA left. Negotiate exemptions narrowly. "SLA suspended only if force majeure event (natural disaster, govt order) occurs; illness and scope changes do not suspend SLA."
No escalation ladder. If a vendor misses throughput one week and the contract says "nothing happens," what stops them missing again? An SLA escalation ladder protects you: Week 1 miss = 2% credit. Week 2 miss = 5% credit + investigation required. Week 3 miss = 10% credit + contract renegotiation or termination option. This forces vendors to fix problems before they compound.
IndiVillage's SLA Track Record
Consistent SLA delivery across high-volume workloads (2.3M+ records in delivery logistics)—not a single missed deadline in the lifetime of that programme. 18 months of zero-drift quality on autonomous-robotics workload, with complexity increasing month-over-month (new crop types, new visual conditions) and accuracy staying above 99.4%. This is proof that the SLAs are not theoretical.
The FAQ
Q: What if we don't know our volume in advance?
Commit to a minimum and a ceiling. "Minimum 5,000 images/week, maximum 25,000/week." The minimum becomes the guaranteed capacity; excess is first-offer (subject to availability). This protects both sides.
Q: Can SLAs change mid-contract?
Yes, but via amendment. If you hit a seasonal spike (harvest time for AgTech, product launch for eCommerce), renegotiate throughput for Q3-Q4, then reset in Q1. Treat it as a business decision, not an exception.
Q: Who measures accuracy?
Ideally a third party or your own validation team. If your partner measures, require them to publish methodology. IndiVillage publishes its QA protocols and invites client validation—transparency strengthens credibility.
Q: What's the cost of an SLA breach?
Depends on impact. A missed deadline on non-critical imagery costs 2-5% of monthly fees. A quality miss on safety-critical data (medical imaging, autonomous-systems training) costs more—potentially 10-20% or contract termination. Price the risk into your vendor selection.
Q: How often should SLAs be reviewed?
Quarterly. Measure actual performance against committed targets. If accuracy is consistently 99%+ but you committed to 98%, consider raising the baseline. If throughput bottlenecks appear, adjust team allocation.
Your Next Best Action
Before signing a white-label partnership, draft your SLAs in writing. Be specific on throughput, accuracy, turnaround, and escalation. Have your partner sign it. Then measure monthly. If they hit their SLAs reliably, you've de-risked your business. If not, you have contractual grounds to remediate or exit.
FAQ
What's the difference between a throughput SLA and a capacity commitment?
Throughput is a weekly/monthly guarantee. Capacity is what the team can theoretically handle at peak. SLA is the number you commit to; capacity is higher (the headroom). Don't confuse them—sell SLA, not capacity. A vendor with 100 annotators might have 50,000 image/week capacity (peak possible output) but commit to 40,000 image/week throughput SLA (sustainable rate). That 10,000-image buffer is the margin they need for illness, training, quality review, and unexpected demand surges.
Can SLAs cover accuracy on rare classes?
Yes, with caveats. "98% accuracy overall, 85% on classes representing <5% of data." Rare classes are harder; be realistic about the floor. If your problem includes 100+ weed species and only 5 of them appear in >1% of images, the SLA might read: "95%+ accuracy on common classes (>1% frequency), 80%+ on rare classes (<1% frequency)." This reflects reality—you cannot get 98% on rare classes without massive oversampling.
What if accuracy drifts gradually, not suddenly?
That's the scenario that kills partnerships. Implement monthly trend reporting: "accuracy trend: 99.2% → 99.1% → 98.9%." If the trend is downward two months running, investigate immediately. Don't wait for a crisis. A robotics company working with IndiVillage caught this via quarterly gold-set re-annotation (measuring consistency across 18 months). When one regional team's accuracy drifted 1.5 points, retraining was triggered immediately. The drift stopped, and the project stayed on track. Without the monitoring, silent degradation would have surfaced only when the robot started making mistakes in the field.
Is 48-hour turnaround on corrections realistic?
For minor corrections (taxonomy clarification): yes. For rework (re-annotating 10K images): no. Set expectations: "Clarifications within 24 hours, rework within two weeks." A healthcare company received clarification on ambiguous lesion definitions in 18 hours. They then resubmitted 5,000 unclear images with the new definition; vendor delivered rework in 10 days. The SLA allowed both timelines because they were separate categories.
Should SLAs include uptime guarantees?
If your partner is your single point of delivery, yes. 99.5% uptime (4.5 hours of downtime per month) is standard. Anything lower is risky for time-sensitive work. An eCommerce company refreshing 1M+ product images annually depends on continuous annotation pipeline. A 3-day vendor outage in Q4 (peak season) costs £100K+ in blocked processing. Their SLA mandates 99.5% uptime with 24-hour incident notification. It's been tested once (power failure in vendor's primary office), and work rerouted within 90 minutes to a backup office.
