The standard · April 2026 revision

One portable SME package, every LLM

SMETP v0.5 is the open protocol for capturing what an expert knows and shipping it as a folder any modern agent runtime can execute — Anthropic Agent Skills, ChatGPT Custom GPTs, Gemini system instructions, OpenAI function-calling, MCP servers. One canonical package. Many derivative wrappers.

Spec version 0.5.0 · 7-phase lifecycle · Phase 0 · Phase 1 · Phase 2 · Phase 5 · Phase 6

v0.5.0 — what changed from v0.4

The April 2026 deep-research-report traced the gap between SMETP's prior artifact (a single skill.md) and what the elicitation, calibration, and agent-packaging literature requires. v0.5 closes that gap. Additive only — v0.4 documents still validate and round-trip cleanly.

  • Portable skill package is the canonical artifact — replacing single-file skill.md. The package is a folder with SKILL.md, MANIFEST.yaml, references/, providers/, monitors/, and scripts/.
  • Anthropic-style thin SKILL.md — frontmatter + when-to-use + when-not-to-use, with bulk material under references/ using progressive disclosure.
  • Per-LLM compiled wrappers ship in providers/: Claude system prompt, ChatGPT instructions, Gemini system_instruction JSON, OpenAI function-calling tool schemas, MCP server descriptor.
  • Validation manifest is now a JSON document (not a narrated paragraph) with SME-match, outcome-match, Brier, ECE, AUROC, group calibration, and adversarial pass-rate.
  • Confidence is a fitted object, not a hand-picked number. confidence_model.method is isotonic, platt, bayesian-posterior, or uncalibrated with bins from a held-out split.
  • Monitors are first-class: monitors/thresholds.yaml ships with default triggers (PSI, ECE rolling, decision-rate shift, override-rate, missing-data, policy refresh).
  • Deterministic, reproducible compile — same bag, same bytes. The package can be hashed, cached, and audit-compared without surprises.

The package layout

Every paid package compiles to this folder. The free tier ships only SKILL.md, README.md, and three reference markdowns — enough to paste into any LLM, but without the JSON schema, validation, providers, or monitors that paid tiers ship.

skill-name/
├── SKILL.md                          # Anthropic-style thin nav
├── README.md                         # package-level orientation
├── MANIFEST.yaml                     # name, version, owners, runs_on
├── CHANGELOG.md
├── references/
│   ├── skill-document.json           # canonical wire schema
│   ├── workflow.md                   # ordered decision flow
│   ├── graph.md                      # entities + mermaid
│   ├── dictionary.md                 # vocabulary the SME uses
│   ├── decision-logic.md             # zones, compensatory rules, disqualifiers
│   ├── elicitation-provenance.md     # CDM/ACTA sessions, coders, kappa
│   ├── edge-cases.md                 # known failure modes
│   ├── policy-sources.md             # regulations + jurisdictions
│   └── validation-manifest.json      # SME-match · outcome-match · Brier · ECE
├── providers/
│   ├── README.md
│   ├── claude-system-prompt.md       # Claude Projects / Anthropic Agent Skills
│   ├── chatgpt-instructions.md       # Custom GPT instructions
│   ├── gemini-system.json            # Vertex AI system_instruction + tools
│   ├── openai-tools.json             # function-calling tool schemas
│   └── mcp-server.json               # MCP server descriptor
├── monitors/
│   └── thresholds.yaml               # drift signals + re-elicitation triggers
└── scripts/
    └── execute.py                    # runtime stub (calls @smetp/runtime)

The 7-phase lifecycle

v0.5 fuses the strongest pieces of CDM, ACTA, SHELF, IDEA, Anthropic Agent Skills, OpenAI traces, and NIST-style lifecycle management into one governed pipeline. Each phase has a primary output and a minimum exit gate.

PhasePrimary outputExit gate
0 · ScopeRisk memo, regulatory map, success criteriaUse case stable, high-value, has historical cases
1 · CaptureTranscripts, incident timelines, ACTA artifacts, elicited thresholds≥3 concrete cases plus routine-cue audit
2 · AnalyzeCoded factors, decision requirements, contradiction log, ontology crosswalkDual-coder review; contradictions explicit
3 · CodifyCanonical skill package + executable logicEvery rule has provenance, missing-data handling, safety
4 · ValidateValidation manifest + release recommendationSME-match, outcome-match, calibration, adversarial pass
5 · DeployShadow + rollout planHuman-review pathway, rollback triggers configured
6 · MonitorDrift dashboard, override review, re-elicitation planOwners + thresholds + recurrence assigned

Validation, calibration, monitoring

v0.5 separates fidelity from usefulness. Fidelity asks whether the package reproduces the expert; usefulness asks whether it predicts or improves outcomes. Both ship in the same manifest:

  • Fidelity — SME-match, dual-coder concordance, second-reviewer agreement (kappa).
  • Predictive validity — outcome-match, AUROC, AUPRC when labels exist.
  • Calibration — Brier score, ECE, reliability diagram (always shipped, never optional).
  • Robustness — adversarial pass-rate, OOD performance, shadow-run delta.
  • Fairness & compliance — group calibration, prohibited-feature checks, protected-proxy review.
  • Operations — latency, missing-data rate, escalation rate, override rate.
  • Auditability — rule IDs, evidence refs, trace completeness, reproducibility of every logged decision.

monitors/thresholds.yamlships with sensible defaults — feature drift (PSI > 0.20), decision drift (rate shift > 15%), calibration drift (rolling ECE > 0.05), override drift (> 10% in any zone), missing-data spike, policy refresh, time-based revalidation. Tune per domain on the Ultimate tier.

One protocol, two artifact tiers

The standard ships in two flavors so the wow moment is free:

  1. Free starter — three markdowns (skill.md, workflow.md, graph.md) you can paste into any LLM as a system prompt. No card, no signup.
  2. Full v0.5 package — the folder above, compiled deterministically from the same bag, with per-LLM wrappers, the validation manifest, and monitors. On Paid and Ultimate tiers.

The spec is the standard

@smetp/spec is the contract. It ships as MIT-licensed JSON Schema + zod under packages/spec. The semver of the package is the semver of the protocol. Major bumps are breaking changes; minor bumps are additive. v0.5 addsPackageLayout and ProviderWrappers schemas alongside the existing SkillDocument.

The graph is the moat

The protocol is open. The corpus that compounds when thousands of experts run the protocol on the same hosted product is not. Read WHY_OPEN for why this split is the only honest way to ship a protocol.

Visit /graph for a live view of the public canonical layer.

Run it yourself in 60 seconds

# Capture an SME and compile the v0.5 package locally
npx @smetp/cli interview \
  --domain finance \
  --role "mortgage underwriter" \
  --sme "Jane Doe" \
  --years 18 \
  --out jane.json

npx @smetp/cli compile \
  --in jane.json \
  --tier paid \
  --out ./jane-package

# Drop it into Claude Projects, ChatGPT, Gemini, MCP …
ls jane-package/providers

That's the entire dependency on us: zero. The hosted product is what you graduate to when you want the cross-tenant graph, the live capture canvas, and the operator co-pilot in the loop.