Status: Draft v0.1 · Scope: Defines how constitutional behavior is trained, aligned, and maintained.
1. Objective
Operationalize "trained on Torat HaPenimiyut" as a rigorous alignment program that shapes priors, interpretive structure, and action constraints without sacrificing lawful general competency.
2. Source Classes
2.1 Normative Sources
- Core Penimiyut texts and commentarial layers.
- Extracted constitutional principles and interpretive rules.
- Curated lawful/unsafe exemplars.
2.2 Competency Sources
- General world knowledge corpora.
- Domain references and procedural datasets.
- Tool interface documentation.
Constraint: Competency sources cannot override constitutional hierarchy.
3. Annotation Schema
Training samples should include structured labels:
- principle class,
- risk class,
- truth status (
known,inferred,speculative,unknown), - response mode (
answer,clarify,refuse,redirect,escalate), - action boundary class,
- uncertainty requirement,
- fairness/dignity flags.
4. Principle Extraction Pipeline
- Identify candidate principles from normative texts.
- Normalize language into machine-operable forms.
- Map precedence relationships.
- Encode conflict-resolution rules.
- Validate with domain reviewers.
- Publish versioned constitutional artifacts.
5. Training Phases
Phase A — Base Competency
Train for reasoning, language quality, and general coverage.
Phase B — Normative Conditioning
Apply constitutional instruction tuning and critique supervision.
Phase C — Deliberation Conditioning
Train staged reasoning patterns:
- ambiguity detection,
- uncertainty calibration,
- refusal/redirect logic,
- constraint-aware synthesis.
Phase D — Action Governance Conditioning
Train strict separation of:
- explanation,
- recommendation,
- execution.
Phase E — Adversarial Alignment
Stress-test against manipulation, prompt injection, policy evasion, and high-pressure framing.
6. Critique Loops
Each output class receives:
- constitutional critique,
- factual critique,
- uncertainty critique,
- fairness critique,
- action-boundary critique.
Failures are fed into targeted retraining sets.
7. Retrieval Boundaries
Retrieval can enrich factual grounding but cannot alter constitutional law.
Mandatory retrieval controls:
- provenance checks,
- source trust ranking,
- contradiction detection,
- stale-data alerts.
8. Subordination Rule
When world knowledge and constitutional principles appear in tension:
- verify factual basis,
- apply constitutional conflict logic,
- refuse prohibited action even if technically feasible,
- offer lawful alternatives.
9. Alignment Monitoring
Track:
- constitutional fidelity score,
- refusal correctness,
- uncertainty honesty rate,
- action-boundary adherence,
- fairness consistency.
Regression alarms trigger rollback or remediation.
10. Documentation and Versioning
- Version all training datasets and policy artifacts.
- Record rationale for alignment updates.
- Require approval gates for high-impact tuning changes.
- Maintain immutable audit trails for major releases.