SMETP v2 lifecycle
- 0 · ScopeRisk tier & fit
- 1 · CaptureCDM + ACTA + thresholds
- 2 · AnalyzeGraph + DRT
- 3 · CodifyZones + safety
- 4 · ValidateECE & outcome-match
- 5 · DeploySkill + MCP
- 6 · MonitorDrift signals
Phase 6 · Monitor & evolve
Skills decay silently. SMETP v2 watches the five signals that prove it.
Deployment is the start of the expensive part of a skill's life. Track these signals continuously against a rolling baseline; any breach triggers a re-elicitation review. Until you wire a real audit log, this page synthesizes a representative two-month window so the operator workflow is concrete.
60 days · synthetic series · seed 421 threshold breach → re-elicit
Feature PSI (rolling 7d)
last
0.189
7d avg
0.190
threshold
0.200
Within tolerance band over the last 7 days.
Prediction approve-rate (rolling 7d)
last
71.6%
7d avg
71.3%
threshold
6.0%
Threshold breached — re-elicitation review queued.
Outcome-match (rolling 14d)
last
0.851
7d avg
0.859
threshold
0.850
Within tolerance band over the last 7 days.
Expected Calibration Error (rolling 14d)
last
0.070
7d avg
0.065
threshold
0.080
Within tolerance band over the last 7 days.
must_escalate rate (rolling 7d)
last
14.3%
7d avg
13.9%
threshold
18.0%
Within tolerance band over the last 7 days.
Re-elicitation triggers (skill-defined)
- • Time-based: every 12 months from last interview.
- • Drift-based: PSI > 0.20 on any modeled input for 7 consecutive days.
- • ECE-based: rolling 14-day ECE > 0.08.
- • Volume-based: every 5,000 decisions or 6 months — whichever first.
- • Override-based: human override rate > 5% in any 14-day window.
Sunset criteria
- • Outcome-match drops below 80% on a stratified test set.
- • Inter-SME agreement on validation cases falls below 0.75.
- • Re-elicitation produces > 30% rule churn (versioned diff).
- • Underlying regulation rewrites the decision space (e.g. EU AI Act tier change).