15%

5.5 Human Review and Confidence Calibration

Human Review คืออะไร

Human Review คือ checkpoint ที่ agent หยุดเพื่อให้ human ตรวจสอบก่อนดำเนินการต่อ — ไม่ใช่ทุก action ต้องผ่าน human review แต่สำหรับ high-stakes decisions, irreversible actions, หรือ low-confidence outputs จำเป็นต้องมี human-in-the-loop เพื่อ safety

When Human Review is Required

REVIEW_REQUIRED = {
    "always": [
        "deploy_production",
        "delete_user_data",
        "send_customer_email",
        "modify_billing",
        "change_permissions",
    ],
    "conditional": {
        "code_changes": lambda diff: diff.lines_changed > 100,
        "api_calls": lambda call: call.cost > 1.00,
        "data_modifications": lambda mod: mod.records_affected > 1000,
    },
    "never": [
        "read_file",
        "search_codebase",
        "run_tests",
        "format_code",
    ]
}

Permission Models

1. Ask-First (Most Conservative)

class AskFirstAgent:
    async def execute(self, action):
        approval = await ask_human(
            f"May I execute: {action.description}?\n"
            f"Impact: {action.impact}\n"
            f"Reversible: {action.reversible}"
        )
        if approval.granted:
            return await action.execute()
        return {"status": "denied", "reason": approval.reason}

2. Act-Then-Report (Most Autonomous)

class AutonomousAgent:
    async def execute(self, action):
        if action.risk_level == "low" and action.reversible:
            result = await action.execute()
            report_to_human(f"Completed: {action.description}\nResult: {result}")
            return result
        else:
            return await ask_first(action)  # Fall back for risky actions

3. Tiered Permissions (Balanced)

class TieredAgent:
    def __init__(self, permission_level):
        self.level = permission_level  # e.g., "read", "write", "admin"
    
    async def execute(self, action):
        required_level = action.required_permission
        
        if self.level >= required_level:
            return await action.execute()
        else:
            # Escalate — need higher permission
            return await escalate_for_permission(action, required_level)

Confidence Calibration

What is Calibration?

Agent ที่ calibrated ดี = เมื่อบอกว่า “80% confident” จะถูก 80% จริงๆ

def calibrated_response(answer, confidence):
    """
    Map confidence to action:
    - 90%+ → Execute automatically
    - 70-90% → Execute with disclaimer
    - 50-70% → Present options, ask human
    - <50% → Don't execute, escalate
    """
    if confidence >= 0.9:
        return {"action": "execute", "answer": answer}
    elif confidence >= 0.7:
        return {"action": "execute_with_note", 
                "answer": answer,
                "note": f"I'm {confidence:.0%} confident. Please verify."}
    elif confidence >= 0.5:
        return {"action": "present_options",
                "best_guess": answer,
                "alternatives": generate_alternatives(answer)}
    else:
        return {"action": "escalate",
                "reason": "Low confidence — need human judgment"}

Over-confidence Problem

Claude sometimes expresses high confidence when wrong — critical to detect:

# Anti-pattern: trusting self-reported confidence
response = "I'm 95% sure the answer is X"  # May be wrong!

# Better: calibrate via multiple signals
def assess_confidence(question, answer):
    signals = {
        "consistency": check_consistency_across_attempts(question, n=3),
        "source_backed": has_citation(answer),
        "within_training": is_likely_in_training_data(question),
        "hedging_language": detect_hedging(answer),  # "might", "possibly"
    }
    
    calibrated = weighted_average(signals)
    return calibrated

Review UX Patterns

Diff-Based Review

แสดง changes ในรูป diff ให้ human approve:

def present_for_review(original, modified, context):
    diff = generate_diff(original, modified)
    return {
        "type": "diff_review",
        "context": context,
        "diff": diff,
        "summary": f"Changed {diff.lines_added} lines, removed {diff.lines_removed}",
        "actions": ["approve", "reject", "modify"]
    }

Batch Review

รวมหลาย decisions เข้าด้วยกันเพื่อลด interruption:

async def batch_review(pending_actions, batch_size=5):
    batches = chunk(pending_actions, batch_size)
    
    for batch in batches:
        decisions = await present_batch(batch)
        for action, decision in zip(batch, decisions):
            if decision == "approve":
                await action.execute()

Key Concepts

Human-in-the-loop — human review ณ จุดที่กำหนด ไม่ใช่ทุก step
Risk-proportional review — high risk = more review; low risk = autonomous
Confidence calibration — match expressed confidence กับ actual accuracy
Review fatigue — ถ้าต้อง review มากเกิน human จะเริ่ม rubber-stamp → ลด review ให้เหลือเฉพาะที่จำเป็น

Exam Tips

ข้อสอบจะถาม: action ไหนต้อง human review — ตอบ: irreversible, high-impact, low-confidence
Permission models 3 แบบ: ask-first, act-then-report, tiered — แต่ละแบบเหมาะตอนไหน
Confidence calibration = agent ต้อง “รู้ว่าตัวเองไม่รู้” — over-confident = dangerous
Review fatigue เป็น real problem — ถ้า review ทุกอย่าง human จะ approve โดยไม่อ่าน
Batch review ลด interruption แต่เพิ่ม risk ของ missed issues

← 5.6 Information Provenance and Multi-Source Synthesis 5.4 Codebase Exploration and Context Degradation →