SMETP v2 lifecycle

  1. 0 · ScopeRisk tier & fit
  2. 1 · CaptureCDM + ACTA + thresholds
  3. 2 · AnalyzeGraph + DRT
  4. 3 · CodifyZones + safety
  5. 4 · ValidateECE & outcome-match
  6. 5 · DeploySkill + MCP
  7. 6 · MonitorDrift signals

Phase 6 · Monitor & evolve

Skills decay silently. SMETP v2 watches the five signals that prove it.

Deployment is the start of the expensive part of a skill's life. Track these signals continuously against a rolling baseline; any breach triggers a re-elicitation review. Until you wire a real audit log, this page synthesizes a representative two-month window so the operator workflow is concrete.

60 days · synthetic series · seed 421 threshold breach → re-elicit

Feature PSI (rolling 7d)

last
0.189
7d avg
0.190
threshold
0.200

Within tolerance band over the last 7 days.

Prediction approve-rate (rolling 7d)

last
71.6%
7d avg
71.3%
threshold
6.0%

Threshold breached — re-elicitation review queued.

Outcome-match (rolling 14d)

last
0.851
7d avg
0.859
threshold
0.850

Within tolerance band over the last 7 days.

Expected Calibration Error (rolling 14d)

last
0.070
7d avg
0.065
threshold
0.080

Within tolerance band over the last 7 days.

must_escalate rate (rolling 7d)

last
14.3%
7d avg
13.9%
threshold
18.0%

Within tolerance band over the last 7 days.

Re-elicitation triggers (skill-defined)

  • • Time-based: every 12 months from last interview.
  • • Drift-based: PSI > 0.20 on any modeled input for 7 consecutive days.
  • • ECE-based: rolling 14-day ECE > 0.08.
  • • Volume-based: every 5,000 decisions or 6 months — whichever first.
  • • Override-based: human override rate > 5% in any 14-day window.

Sunset criteria

  • • Outcome-match drops below 80% on a stratified test set.
  • • Inter-SME agreement on validation cases falls below 0.75.
  • • Re-elicitation produces > 30% rule churn (versioned diff).
  • • Underlying regulation rewrites the decision space (e.g. EU AI Act tier change).