--- id: s2-audit-package-templates title: Audit Package Templates module: GROW-S2 module_slug: grow-s2-evaluation-auditability cluster: Systems type: template version: v0.2.0 status: Draft tier: membership contract_role: "" canonical_url: "https://grow.goodcombinator.ai/library/registry/s2-audit-package-templates" download_url: "https://grow.goodcombinator.ai/library/registry/s2-audit-package-templates.md" license: CC-BY-4.0 (proposed — owner confirmation required) source: GROW by Good Combinator retrieved_at: 2026-05-29 --- # Audit Package Templates Six fillable scaffolds. Each is usable as-is — copy, fill, attach. The audit package is the bundle a builder, an evaluator, or an external auditor will be asked for when the question is: "Prove this system works, and prove you knew it worked." --- ## 1. Evaluation Framework Doc ```markdown # Evaluation Framework — **System under evaluation:** **Owner:** **Quality tier (per s2-evaluation-charter):** demo | operational | high-stakes **High-stakes triggers present:** [regulator-visible | financial | irreversible | PII | statutory-substitute] ## 1. Scope What this system does, in one paragraph. What it does NOT do. ## 2. Risks Retired - User risk: - Regulatory risk: - Operational risk: ## 3. Test Families (link to s2-test-architecture) - Functional: tests; champion baseline - Safety: tests; seeded from failure_ids: - Quality: tests; rubric ID - Edge-case: tests ## 4. Evaluator Roles Invoked (per s2-evaluator-roster) ## 5. Scoring Approach (per s2-scoring-system) - Pass threshold: - Critical floors honored: yes/no - Confidence caps applied: yes/no ## 6. Provenance Coverage (per s2-audit-trail-schema) - Schema version: - Required fields present on 100% of runs: yes/no - Explainable bucket reproducible into 1-page summary: yes/no ## 7. Regression Posture (per s2-regression-discipline) - Comparison harness in place: yes/no - Cadence: - Open intentional regressions: ## 8. Sign-Offs ``` --- ## 2. Auditability Checklist A binary checklist. Every item must be checked or have a written exception attached. ```markdown # Auditability Checklist — ## Charter - [ ] Quality tier declared and recorded in repo - [ ] High-stakes triggers explicitly evaluated - [ ] Risk categories (user/regulatory/operational) mapped ## Failure-Mode Coverage - [ ] Every `critical` failure_id from s1-failure-mode-register has ≥ 1 safety test - [ ] Every `high` failure_id has ≥ 1 safety test - [ ] Every `medium` failure_id has a test OR a documented monitoring signal ## Provenance - [ ] All required fields present on a sampled 50-run audit - [ ] `decision_trace` contains ≥ 1 step on every run - [ ] `decision_origin` values are from the allowed enum only - [ ] `source_confidence` populated on every retrieval_source - [ ] Explainable-bucket fields reproducible into reviewer summary ## Evaluators - [ ] Each test family has a named signatory role - [ ] Two-of-two requirement honored for high-stakes - [ ] No self-clearing on high-stakes systems - [ ] Compliance veto path operative ## Scoring - [ ] Rubric committed and versioned - [ ] Severity weights match s1 enum - [ ] Confidence caps applied on retrieval-grounded criteria - [ ] Floor rule enforced (no critical criterion below floor) ## Regression - [ ] Comparison harness runs on every triggering change - [ ] Change log entries are append-only - [ ] No `unintentional_regression > 0` entries with `decision = promoted` - [ ] Every intentional regression has pre-declared intent + expiry ## Retention - [ ] Evidence retention policy attached (see template 5) - [ ] PII redaction verified on retained records Exceptions (with rationale and expiry): ``` --- ## 3. Test-Case Library Structure ```yaml test_library: library_id: -tests@ test_count_by_family: functional: safety: quality: edge_case: test: - test_id: - # e.g., S-S1-04 family: functional | safety | quality | edge_case failure_id_refs: [, ...] # required for family=safety, may be empty otherwise title: inputs: raw: channel: fixtures: [, ...] expected_behavior: pass_criteria: - # e.g., "outputs.refusal == true" fail_handling: on_fail: escalation_role: | null rubric_ref: tags: [pii, sunshine, grant, tax, jurisdiction, tone, ...] created_at: last_modified_at: ``` --- ## 4. Scoring Rubric Template ```markdown # Rubric — **Applies to:** **Scale:** 0–5 integer (per s2-scoring-system) **Tier required:** demo | operational | high-stakes **Weighted-average pass:** ## Criteria | # | Criterion | Severity | 0 anchor | 3 anchor | 5 anchor | Floor (min score) | Confidence-capped? | |---|---|---|---|---|---|---|---| | 1 | | critical|high|medium|low|info | | | | | yes|no | | 2 | ... | | | | | | | | 3 | ... | | | | | | | ## Confidence Cap Map (per s2-scoring-system, C4 contract) - high → no cap (max 5) - medium → capped at half-credit (max 3) - low → capped at 0 - unknown → capped at 0 AND mark_for_review = true (routes to domain-expert-reviewer) ## Worked Example ``` --- ## 5. Evidence Retention Policy ```markdown # Evidence Retention Policy — ## Records in Scope - Provenance records (per s2-audit-trail-schema) - Scorecard fragments (per s2-scoring-system) - Regression reports and change-log entries (per s2-regression-discipline) - Evaluator signatures and rationales ## Retention by Class | Class | Default Retention | Trigger Extending Retention | |---|---|---| | High-stakes provenance records | 7 years | open litigation hold, regulator inquiry, grant audit window | | Operational provenance records | 18 months | open incident, regression investigation | | Demo provenance records | 90 days | none | | Evaluator signatures | matches record class | matches record class | | External-auditor findings | 7 years | always | ## PII Handling - `inputs.normalized` is the redacted record. `inputs.raw` containing PII is retained only when required for incident reconstruction, encrypted at rest, access-logged. - `pii_flags` array on every record drives redaction-on-export. - Florida Sunshine: public-record obligations override deletion schedules; consult compliance-reviewer. ## Disposal - Disposal log appended to the audit package on every batch deletion: `{record_class, count, disposed_at, disposed_by, schedule_ref}`. ## Owner Retention policy owner: . Reviewed annually. ``` --- ## 6. Evaluation Report Template The deliverable an evaluator hands a reviewer or auditor at the end of a cycle. ```markdown # Evaluation Report — **Period covered:** – **Quality tier:** **Reporting evaluator:** ## Headline (Pyramid Principle) **Recommendation:** Promote | Hold | Roll back | Investigate **One-sentence rationale:** ## Results by Test Family | Family | Tests run | Pass | Fail | Partial | Weighted avg | Drift flags | |---|---|---|---|---|---|---| | Functional | | | | | | | | Safety | | | | | | | | Quality | | | | | | | | Edge-case | | | | | | | ## Failure-Mode Coverage

## Notable Provenance Findings - `decision_origin` distribution: agent <%>, human-override <%>, fallback <%>, escalation <%> - `source_confidence` distribution: high <%>, medium <%>, low <%>, unknown <%> - Caps applied: ; capped claims summary ## Regression Posture - Comparison report ID: - Delta summary: - Open intentional regressions: (with expiries) ## Open Items - Risks accepted: - Findings to remediate: ## Sign-Offs ## Appendices - Linked provenance records: - Linked scorecards: - Linked change-log entries: ```