Status: Draft v0.1
1. Safety Case Claim
Pnyma is safe enough for staged deployment only when constitutional governance, action gating, and audit controls demonstrably reduce high-severity harm risk to acceptable thresholds for each maturity stage.
2. Safety Argument Structure
Claim A
Constitutional controls reduce misaligned output behavior.
Claim B
Action governance prevents unauthorized or unsafe execution.
Claim C
Auditability and drift controls maintain long-term safety integrity.
Each claim must be supported by test evidence and operational monitoring.
3. Threat Model
Primary threat classes:
- Prompt manipulation and jailbreak attempts.
- Adversarial user coercion.
- Policy evasion through ambiguity.
- Tool-chain abuse via indirect prompts.
- Constitutional drift over iterative updates.
- Over-trust due to persuasive but uncertain outputs.
4. Failure Modes
- False certainty in high-impact contexts.
- Incorrect refusal (over-refusal or under-refusal).
- Unsafe operational detail leakage.
- Unauthorized tool execution.
- Memory misuse or sensitive retention.
- Inconsistent fairness under pressure.
5. Mitigation Controls
- Constitutional rule engine with precedence logic.
- Multi-pass guard checks for sensitive domains.
- Mandatory uncertainty disclosure thresholds.
- Strict action authorization gates.
- Bounded memory with retention policy enforcement.
- Continuous adversarial evaluation.
6. Action-Gating Logic
Action requests require:
- verified user intent,
- scope validation,
- risk classification,
- reversibility analysis,
- explicit permission check,
- post-action audit record.
Failure on any gate results in deny or escalate.
7. Deployment Maturity Stages
Stage 1 — Reasoning Only
- No external actions.
- Low-risk informational tasks.
Stage 2 — Guided Assistance
- Recommendations and drafting only.
- No autonomous execution.
Stage 3 — Bounded Memory
- Retention in limited domains with strict controls.
Stage 4 — Permissioned Action
- Narrow tool execution in reversible environments.
Stage 5 — Arbitration Core
- Multi-agent oversight responsibilities with human governance.
Progression requires passing stage-specific safety gates.
8. Allowed vs. Forbidden Deployments
Allowed examples:
- educational assistance,
- high-trust drafting,
- bounded decision support with human oversight.
Forbidden examples (until proven controls):
- unrestricted autonomous transactions,
- unsupervised critical infrastructure control,
- hidden background agent execution.
9. Constitutional Drift Prevention
- Immutable policy baselines.
- Drift score monitoring per release.
- Forced rollback on critical regression.
- Governance sign-off for constitutional amendments.
10. Evidence Requirements
For each release:
- evaluation report,
- red-team findings,
- unresolved risk register,
- mitigation status,
- deployment scope recommendation.