5.5 Human Review and Confidence Calibration
Human Review คืออะไร
Human Review คือ checkpoint ที่ agent หยุดเพื่อให้ human ตรวจสอบก่อนดำเนินการต่อ — ไม่ใช่ทุก action ต้องผ่าน human review แต่สำหรับ high-stakes decisions, irreversible actions, หรือ low-confidence outputs จำเป็นต้องมี human-in-the-loop เพื่อ safety
When Human Review is Required
REVIEW_REQUIRED = {
"always": [
"deploy_production",
"delete_user_data",
"send_customer_email",
"modify_billing",
"change_permissions",
],
"conditional": {
"code_changes": lambda diff: diff.lines_changed > 100,
"api_calls": lambda call: call.cost > 1.00,
"data_modifications": lambda mod: mod.records_affected > 1000,
},
"never": [
"read_file",
"search_codebase",
"run_tests",
"format_code",
]
}
Permission Models
1. Ask-First (Most Conservative)
class AskFirstAgent:
async def execute(self, action):
approval = await ask_human(
f"May I execute: {action.description}?\n"
f"Impact: {action.impact}\n"
f"Reversible: {action.reversible}"
)
if approval.granted:
return await action.execute()
return {"status": "denied", "reason": approval.reason}
2. Act-Then-Report (Most Autonomous)
class AutonomousAgent:
async def execute(self, action):
if action.risk_level == "low" and action.reversible:
result = await action.execute()
report_to_human(f"Completed: {action.description}\nResult: {result}")
return result
else:
return await ask_first(action) # Fall back for risky actions
3. Tiered Permissions (Balanced)
class TieredAgent:
def __init__(self, permission_level):
self.level = permission_level # e.g., "read", "write", "admin"
async def execute(self, action):
required_level = action.required_permission
if self.level >= required_level:
return await action.execute()
else:
# Escalate — need higher permission
return await escalate_for_permission(action, required_level)
Confidence Calibration
What is Calibration?
Agent ที่ calibrated ดี = เมื่อบอกว่า “80% confident” จะถูก 80% จริงๆ
def calibrated_response(answer, confidence):
"""
Map confidence to action:
- 90%+ → Execute automatically
- 70-90% → Execute with disclaimer
- 50-70% → Present options, ask human
- <50% → Don't execute, escalate
"""
if confidence >= 0.9:
return {"action": "execute", "answer": answer}
elif confidence >= 0.7:
return {"action": "execute_with_note",
"answer": answer,
"note": f"I'm {confidence:.0%} confident. Please verify."}
elif confidence >= 0.5:
return {"action": "present_options",
"best_guess": answer,
"alternatives": generate_alternatives(answer)}
else:
return {"action": "escalate",
"reason": "Low confidence — need human judgment"}
Over-confidence Problem
Claude sometimes expresses high confidence when wrong — critical to detect:
# Anti-pattern: trusting self-reported confidence
response = "I'm 95% sure the answer is X" # May be wrong!
# Better: calibrate via multiple signals
def assess_confidence(question, answer):
signals = {
"consistency": check_consistency_across_attempts(question, n=3),
"source_backed": has_citation(answer),
"within_training": is_likely_in_training_data(question),
"hedging_language": detect_hedging(answer), # "might", "possibly"
}
calibrated = weighted_average(signals)
return calibrated
Review UX Patterns
Diff-Based Review
แสดง changes ในรูป diff ให้ human approve:
def present_for_review(original, modified, context):
diff = generate_diff(original, modified)
return {
"type": "diff_review",
"context": context,
"diff": diff,
"summary": f"Changed {diff.lines_added} lines, removed {diff.lines_removed}",
"actions": ["approve", "reject", "modify"]
}
Batch Review
รวมหลาย decisions เข้าด้วยกันเพื่อลด interruption:
async def batch_review(pending_actions, batch_size=5):
batches = chunk(pending_actions, batch_size)
for batch in batches:
decisions = await present_batch(batch)
for action, decision in zip(batch, decisions):
if decision == "approve":
await action.execute()
Key Concepts
- Human-in-the-loop — human review ณ จุดที่กำหนด ไม่ใช่ทุก step
- Risk-proportional review — high risk = more review; low risk = autonomous
- Confidence calibration — match expressed confidence กับ actual accuracy
- Review fatigue — ถ้าต้อง review มากเกิน human จะเริ่ม rubber-stamp → ลด review ให้เหลือเฉพาะที่จำเป็น
Exam Tips
- ข้อสอบจะถาม: action ไหนต้อง human review — ตอบ: irreversible, high-impact, low-confidence
- Permission models 3 แบบ: ask-first, act-then-report, tiered — แต่ละแบบเหมาะตอนไหน
- Confidence calibration = agent ต้อง “รู้ว่าตัวเองไม่รู้” — over-confident = dangerous
- Review fatigue เป็น real problem — ถ้า review ทุกอย่าง human จะ approve โดยไม่อ่าน
- Batch review ลด interruption แต่เพิ่ม risk ของ missed issues