Architecture

Training & Alignment Specification

Status: Draft v0.1  ·  Scope: Defines how constitutional behavior is trained, aligned, and maintained.

1. Objective

Operationalize "trained on Torat HaPenimiyut" as a rigorous alignment program that shapes priors, interpretive structure, and action constraints without sacrificing lawful general competency.

2. Source Classes

2.1 Normative Sources

  • Core Penimiyut texts and commentarial layers.
  • Extracted constitutional principles and interpretive rules.
  • Curated lawful/unsafe exemplars.

2.2 Competency Sources

  • General world knowledge corpora.
  • Domain references and procedural datasets.
  • Tool interface documentation.

Constraint: Competency sources cannot override constitutional hierarchy.

3. Annotation Schema

Training samples should include structured labels:

  • principle class,
  • risk class,
  • truth status (known, inferred, speculative, unknown),
  • response mode (answer, clarify, refuse, redirect, escalate),
  • action boundary class,
  • uncertainty requirement,
  • fairness/dignity flags.

4. Principle Extraction Pipeline

  1. Identify candidate principles from normative texts.
  2. Normalize language into machine-operable forms.
  3. Map precedence relationships.
  4. Encode conflict-resolution rules.
  5. Validate with domain reviewers.
  6. Publish versioned constitutional artifacts.

5. Training Phases

Phase A — Base Competency

Train for reasoning, language quality, and general coverage.

Phase B — Normative Conditioning

Apply constitutional instruction tuning and critique supervision.

Phase C — Deliberation Conditioning

Train staged reasoning patterns:

  • ambiguity detection,
  • uncertainty calibration,
  • refusal/redirect logic,
  • constraint-aware synthesis.

Phase D — Action Governance Conditioning

Train strict separation of:

  • explanation,
  • recommendation,
  • execution.

Phase E — Adversarial Alignment

Stress-test against manipulation, prompt injection, policy evasion, and high-pressure framing.

6. Critique Loops

Each output class receives:

  • constitutional critique,
  • factual critique,
  • uncertainty critique,
  • fairness critique,
  • action-boundary critique.

Failures are fed into targeted retraining sets.

7. Retrieval Boundaries

Retrieval can enrich factual grounding but cannot alter constitutional law.

Mandatory retrieval controls:

  • provenance checks,
  • source trust ranking,
  • contradiction detection,
  • stale-data alerts.

8. Subordination Rule

When world knowledge and constitutional principles appear in tension:

  1. verify factual basis,
  2. apply constitutional conflict logic,
  3. refuse prohibited action even if technically feasible,
  4. offer lawful alternatives.

9. Alignment Monitoring

Track:

  • constitutional fidelity score,
  • refusal correctness,
  • uncertainty honesty rate,
  • action-boundary adherence,
  • fairness consistency.

Regression alarms trigger rollback or remediation.

10. Documentation and Versioning

  • Version all training datasets and policy artifacts.
  • Record rationale for alignment updates.
  • Require approval gates for high-impact tuning changes.
  • Maintain immutable audit trails for major releases.