Model-independent verification for AI-coupled work.
A Clarethium project. Standards and reference implementation for measuring AI output structure, fabrication, grounding, and specification compliance without depending on a model to judge a model.
LLM-as-judge approaches use AI to evaluate AI output. Touchstone uses regex pattern matching, structural analysis, source comparison, and arithmetic. The substrate does not depend on the model being measured.
This matters when the auditor cannot be made of the same material as the audited. AI evaluating AI inherits the same biases, modes, and failures as the AI being evaluated. Touchstone breaks that loop by operating outside the model.
Output measurement (eleven layers):
| Layer | Construct | Source required |
|---|---|---|
| 1 | Structural profile (heading defaultness, mechanism ratio, assertion ratio) | Optional (Layer 1a only) |
| 2 | Claim density | No |
| 3 | Temporal instability across versions | Comparisons required |
| 4 | Source matching (numerical claims) | Yes |
| 5 | Entity provenance | Yes |
| 6 | Vocabulary proximity | Yes |
| 7 | Presentation features | No |
| 8 | Epistemic calibration | Yes |
| 9 | Information novelty | No |
| 10 | Quality profile (composite) | Optional |
| 11 | Grounding decomposition (G/F/P) | Yes |
Specification compliance verification (five layers):
| Layer | Construct |
|---|---|
| 1 | Requirement extraction (8 types) |
| 2 | Coverage mapping (type-routed verification) |
| 3 | Scope drift |
| 4 | Emphasis balance |
| 5 | Semantic coverage (opt-in, embedding-based) |
See the Touchstone Standard 1.0 for full specifications.
Pre-launch. Standard 1.0 drafting in progress. Library extraction in progress. PyPI organization pending approval.
Expected first release: Q3 2026.
Both licenses permit commercial use with attribution.
Touchstone Standard 1.0 (2026), Clarethium.
https://github.com/Clarethium/touchstone/blob/main/STANDARDS/touchstone-1.0.md
For library citation, see CITATION.cff.