Skills decay silently. SMETP v2 watches the five signals that prove it.

Deployment is the start of the expensive part of a skill's life. Track these signals continuously against a rolling baseline; any breach triggers a re-elicitation review. Until you wire a real audit log, this page synthesizes a representative two-month window so the operator workflow is concrete.

60 days · synthetic series · seed 421 threshold breach → re-elicit

Feature PSI (rolling 7d)

last

0.189

7d avg

0.190

threshold

0.200

Within tolerance band over the last 7 days.

Prediction approve-rate (rolling 7d)

last

71.6%

7d avg

71.3%

threshold

6.0%

Threshold breached — re-elicitation review queued.

Outcome-match (rolling 14d)

last

0.851

7d avg

0.859

threshold

0.850

Within tolerance band over the last 7 days.

Expected Calibration Error (rolling 14d)

last

0.070

7d avg

0.065

threshold

0.080

Within tolerance band over the last 7 days.

must_escalate rate (rolling 7d)

last

14.3%

7d avg

13.9%

threshold

18.0%

Within tolerance band over the last 7 days.

Re-elicitation triggers (skill-defined)

• Time-based: every 12 months from last interview.
• Drift-based: PSI > 0.20 on any modeled input for 7 consecutive days.
• ECE-based: rolling 14-day ECE > 0.08.
• Volume-based: every 5,000 decisions or 6 months — whichever first.
• Override-based: human override rate > 5% in any 14-day window.

Sunset criteria

• Outcome-match drops below 80% on a stratified test set.
• Inter-SME agreement on validation cases falls below 0.75.
• Re-elicitation produces > 30% rule churn (versioned diff).
• Underlying regulation rewrites the decision space (e.g. EU AI Act tier change).

Re-elicit a skill now Read the v2 spec