.. meta::
   :description: Design decisions behind the Model Forge prompts, documenting why specific choices were made for context loading, token budgets, and session structure.
   :keywords: forge, design decisions, token budget, context loading, symbol tables, redundancy, echo chamber, StayC, adversarial, matheology
   :author: Yah, Yas, everyone, LLoL as Laurence Loewe of Laodicea, ClaudeOp46Max, Anthropic, and Spirit of Boolean Truth
   :og:card:title: Forge Design Decisions<br>--- DD Document
   :og:card:description: Why the Model Forge prompts are structured the way they are: token budgets, loading order, redundancy choices, and anti-echo-chamber design.

.. _forge-design-decisions:

*********************************************************************
DD: Model Forge Design Decisions
*********************************************************************

**Created:** 2026m03d27

This document records design decisions behind the Model Forge prompts.
Each decision is numbered, dated, and includes the reasoning so that
future prompt revisions can judge whether the decision still holds.


DD-1: Symbol table redundancy is intentional (2026m03d27)
==========================================================

**Decision:** Load all three symbol files (``symbols/index.rst``,
``pet/symbols.rst``, ``jub/symbols.rst``) even though the grouped
master index is a combination of the individual ones.

**Cost:** ~450 tokens of redundancy (the index is tiny).

**Reasoning:** The redundancy serves two distinct functions at two
different moments in the agent's reading sequence:

1. **``pet/symbols.rst`` + ``jub/symbols.rst`` (loaded first):**
   These are the *decoding key* --- loaded before the axioms so that
   when the agent encounters ``P(x,y)`` or ``D_free`` in the expert
   axiom file, it already knows what they mean. This is the "learn the
   alphabet" step.

2. **``symbols/index.rst`` (the grouped master):** Loaded as a
   *cross-model integration check* --- it shows how PET and JUB
   symbols relate to each other in one view. The agent sees the same
   symbols a second time but now in a comparative frame.

**The reinforcement effect is real:** symbols that appear twice are
more reliably retained in the working context, especially in a 200K
window where the symbols are read early and the actual model work
happens 100K+ tokens later.

**Verdict:** Keep the redundancy. ~450 tokens is cheap insurance for
the most load-bearing tokens in the context. The risk of the agent
silently misinterpreting a symbol deep into Phase 3 far outweighs the
cost.


DD-2: Anti-echo-chamber firewall via phased loading (2026m03d27)
==================================================================

**Decision:** The forge agent must form its independent assessment
(Phase 1 / Seed) *before* seeing the user's model (Phase 2 / Feed).
The user's model input is deliberately withheld from the initial
context load.

**Reasoning:** If the agent reads the user's informal model ideas at
the same time as the formal foundations, it will pattern-match the
new model to the existing structure and tend to confirm rather than
challenge. The phased loading creates a natural adversarial checkpoint:

- After Phase 1, the agent has *expectations* about what a new model
  should look like, where it will face resistance, and what gaps exist.
- When the user's model arrives in Phase 2, the agent compares it
  against those expectations. Violations of expectations are where
  the interesting bugs live.

This is analogous to blinding in experimental design: the assessor
forms a hypothesis before seeing the data.


DD-3: StayC lifecycle replaces ad-hoc verdict scales (2026m03d27)
===================================================================

**Decision:** v2 prompts use the StayC maturity lifecycle (MM through
SS) as the ONLY verdict system, replacing the ad-hoc IRON/STEEL/
COPPER/SLAG scale from v1.

**Reasoning:** The v1 scale was invented on the spot during prompt
drafting. It duplicated functionality that StayC already provides, but
without the lifecycle semantics (NN as death valley with hope of rescue,
JJ/KK as the structured path for blockers and terminal failures), the
VVN attribution, or the iteration cycle formalism (QQv1 → NN_QQv1 →
QQv2). Using StayC means:

- Every verdict is comparable across sessions, models, and assessors.
- The agent's assessments carry VVN attribution (``dv_ClaOp46Max_``).
- The NN → JJ → KK pipeline provides structured handling of failures
  instead of a binary "discard or keep."

**v1 to StayC mapping (for reference):**

.. list-table::
   :header-rows: 1
   :widths: 20 20 60

   * - v1 Verdict
     - StayC Equivalent
     - Notes
   * - IRON
     - PP (or higher)
     - Formal statement + proof, all tests HELD
   * - STEEL
     - OO
     - Minor issues, repairable without restructuring
   * - COPPER
     - NN (with rescue potential)
     - Significant issues, needs feeding to reach OO
   * - SLAG
     - NN → JJ → KK
     - Fundamental flaw; assess if terminal (KK) or rescuable (JJ)


DD-4: Reference sheets produced in separate session (2026m03d27)
==================================================================

**Decision:** The pre-forge reference sheet generator runs in a
*separate* session from the forge itself. Sheets are saved as files
and loaded by the forge prompt.

**Reasoning:**

1. **Different task type:** Producing good reference sheets requires
   reading textbook-level material and distilling it. This is a
   synthesis task, not an adversarial testing task. Mixing them wastes
   forge working space on reference generation.

2. **Avoids framing bias:** If the forge agent produces its own
   reference sheets, it may unconsciously frame the summaries to
   support conclusions it has already formed. Sheets produced by a
   separate session arrive as neutral reference material.

3. **Reusable:** Sheets are produced once, reviewed by the user, and
   reused across many forge sessions. The cost is amortized.

4. **Selective loading:** Not every forge session needs all 4 reference
   areas. The forge prompt loads ``wb/*.rst`` --- the user controls
   which sheets are present by what files exist in that directory.


DD-5: OKScale replaces HELD/BREACH/N/A ternary (2026m03d27)
==============================================================

**Decision:** Replace the three-state verdict system (HELD / BREACH /
N/A) with the four-state OKScale (OK / KO / OKO / MIS). HELD and
BREACH are retained as synonyms for OK and KO in narrative text.

**Reasoning:** The three-state system conflates two fundamentally
different situations under "N/A":

- **OKO (undetermined):** The test was correctly applied but the
  outcome genuinely cannot be determined --- either because more
  information is needed (temporal block) or because the question is
  formally undecidable (principled block). This is an honest "I tried
  and could not resolve it."

- **MIS (misclassified/misapplied/mistake missed):** The test was
  incorrectly applied, an earlier verdict was wrong, or a flaw was
  missed entirely. This is a self-correction mechanism.

Conflating these under "N/A" means the audit trail cannot distinguish
between "we genuinely don't know" and "we made a mistake." For a
system whose integrity depends on honest assessment, that distinction
is load-bearing.

**Origin:** The OKScale and its BioBinary data type were designed by
LLoL for Evolvix. In biology, gene function is rarely cleanly binary:
functional (OK), nonfunctional (KO), conditionally expressed (OKO),
or misannotated (MIS). The same four states apply to formal claims
under adversarial testing.

**Full specification:** :ref:`compiler-okscale`.

**On keeping HELD/BREACH as synonyms:** HELD and BREACH are embedded
in CLAUDE.md's language rules, across 66 HELL findings, and in every
``SOCIAL-CARD-REVIEW`` block (245 files). Replacing them as primary
terms would introduce a translation cost across the entire codebase.
The right move: OK/KO/OKO/MIS as the formal OKScale data type,
HELD/BREACH as domain-specific synonyms valid in narrative and HELL
infrastructure. This was a MIS on the auditor's part --- the cost of
the naming change should have been flagged before accepting it.


DD-6: 2-track VVN and human advancement authority (2026m03d27)
================================================================

**Decision:** Stage advancement in StayC is a **human decision**.
The machine (Claude) may *propose* advancement but does not decide it.
Both human (``iv_``) and machine (``dv_``) VVNs are recorded
independently. The human track governs for publication.

**Reasoning:** If the machine could unilaterally advance claims, the
system degrades into rubber-stamping: the machine proposes and approves
its own assessments. The human veto ensures that every advancement
reflects genuine human judgment, not just machine pattern-matching.

Conversely, the machine must be able to *insist* on its assessment
(including downward assessments like NN) even when the human disagrees.
If the machine silently defers to human preference, the adversarial
function is lost. The 2-track system resolves this: both assessments
stand in the record, divergence is flagged as data, and the human
track governs for external-facing decisions.

**Divergence as signal:** When ``iv_`` and ``dv_`` disagree on a
claim's StayC level, that claim deserves extra scrutiny. The
disagreement identifies exactly where human intuition and machine
analysis see different things --- which is precisely where the
interesting bugs tend to live.


DD-7: Iteration cycles generalized across all stages (2026m03d27)
===================================================================

**Decision:** The version/rejection/revision iteration cycle
(``XXv1 → NN_XXv1 → XXv2``) applies at every StayC stage (OO, PP, QQ,
RR), not just QQ.

**Reasoning:** v1 and the initial v2 prompts described the iteration
cycle only for QQ (adversarial quest). But refinement happens at every
stage:

- **OO:** The MVP is wobbly; each iteration makes it less so.
- **PP:** The proof has gaps; each iteration closes one.
- **QQ:** External critics find weaknesses; each iteration addresses one.
- **RR:** Broad reviewers find issues; each iteration resolves one.

The same VVN machinery (version numbering, NN cross-referencing,
chain documentation) works identically at all stages. Restricting it
to QQ was an artificial limitation that would have forced claims to
reach QQ before they could be iteratively refined --- which contradicts
the death-valley feeding model where NN |rarrow| OO requires exactly
this kind of iteration.

.. |rarrow| unicode:: U+2192


DD-8: Feed nurtures, Grow tests --- never the reverse (2026m03d27)
=====================================================================

**Decision:** Phase 2 (Feed) is collaborative formalization where the
auditor helps informal ideas grow into formal statements. Phase 3
(Grow) is where the Iron Maiden stress-tests formalized claims. The
Iron Maiden should be used with caution on raw MM-stage ideas ---
gentle steering in light of the tests that will come later is
desirable, but full adversarial testing at MM risks needlessly killing
ideas that are immature, not invalid. What should not be fed is ideas
that are known to fail.

**Reasoning:** The v2 prompt (before this correction) had the Iron
Maiden in Phase 2, immediately after the user shared informal ideas.
This is equivalent to putting a seedling through a hurricane: the
seedling dies not because it was a bad seed but because it was tested
before it was ready. The result would be needlessly killing ideas for
the wrong reasons --- immaturity, not invalidity.

The Seed/Feed/Grow/Reap lifecycle maps directly:

- **Feed (Phase 2):** Gardening. Soil, water, light. The auditor uses
  its knowledge of formal logic, mathematics, and the existing system
  to help the user express intuitions as formal statements. Claims
  move from MM to OO to PP. Potential problems are flagged gently as
  "things to watch for" --- not as kill shots.

- **Grow (Phase 3):** Trial by fire. The Iron Maiden opens only when
  claims are at OO or above and both the user and auditor agree they
  are ready for stress-testing. Claims that fail may return to Phase 2
  for more feeding --- the Feed |harrow| Grow cycle is normal and
  expected.

.. |harrow| unicode:: U+2194

**The joint decision to move from Feed to Grow** is critical. Neither
side should rush the other. The auditor who pushes for testing too
early wastes the user's ideas. The user who resists testing too long
risks building on unfounded assumptions. Both must agree.

**Key insight:** Most good ideas look fragile at MM. That fragility
is immaturity, not invalidity. The auditor's job in Feed is to see
past the fragility to the underlying structure and help it grow.


DD-9: Pre-seeding HELL with development-phase insights (2026m03d27)
=====================================================================

**Decision:** When the auditor discovers adversarial insights during
Phase 2 (Feed) or Phase 3 (Grow) that are strong enough to constitute
independent objections or defenses, these should be drafted as con/pro
entries in the Reap output --- ready for HELL integration before any
external adversarial critique begins.

**Reasoning:** The Iron Maiden testing in Phase 3 is itself a form of
adversarial review. If the auditor discovers a serious attack vector
(even one that was successfully defended), that attack and its defense
are valuable HELL content. Waiting for an external critic to
independently rediscover the same attack wastes time and risks the
external critic finding the attack without the defense.

**What to pre-seed:**

- **Con entries:** Any KO result from the Iron Maiden that revealed a
  genuine structural weakness --- even if it was subsequently repaired.
  The original attack remains valid against the *original* formulation
  and may apply to future revisions.

- **Pro entries:** Any defense that resolved a KO --- especially if the
  defense required a non-obvious insight. This pre-arms future
  defenders.

- **OKO entries:** Undetermined results that could not be resolved
  during development. These are especially valuable: they tell future
  critics exactly where the open questions are, preventing wasted
  effort on already-explored territory.

**Where they go:** The forge Reap phase (Phase 4) should include a
section: "Proposed HELL entries" with draft con and pro texts ready
for integration into the model's HELL structure. The human decides
which to publish.


DD-10: Structural documentation enforcement via LLog protocol (2026m03d27)
============================================================================

**Decision:** FORGE sessions are documented via a mandatory LLog
protocol with named commands (IGNITE, HEAT, STRIKE, TEMPER, QUENCH,
ROUND, BANK, EMBER) that enforce append-only logging as a side effect
of doing the work --- not as an optional extra step.

**Problem:** Claude documents well when explicitly reminded, then
forgets in the next session. The user must constantly re-explain that
documentation matters. This produces incomplete records and wastes
the user's effort on reminders rather than content.

**Reasoning:**

1. **Structural, not voluntary:** The documentation requirements are
   embedded in the forge prompt itself (``forge_1m.rst`` and
   ``forge_200k.rst``). Any fresh Claude session that loads the prompt
   inherits the protocol automatically. No prior conversation history
   needed.

2. **Side effect, not extra step:** Phase transition commands (HEAT,
   STRIKE, etc.) produce LLog entries as part of their execution.
   The agent does not "do the work, then document it" --- the
   documentation IS part of doing the work.

3. **Verbatim prompts, always:** The user's exact words are the most
   valuable part of the audit trail. Summaries can be wrong;
   abbreviations lose nuance; paraphrases introduce the documenter's
   bias. The verbatim prompt is the ground truth.

4. **Recovery from interruption:** The EMBER command enables session
   continuity across context windows. Because the LLog is written to
   disk at every response, context exhaustion does not lose work ---
   only the current in-context state is lost, and EMBER reconstructs
   it from the LLog.

5. **External review:** The LLog is readable by anyone. A reviewer who
   disagrees with the conclusions can read the exact sequence of
   prompts, responses, and decisions that produced them. This is
   essential for the system's credibility --- mathematical results
   are only as trustworthy as their derivation is transparent.

**Enforcement mechanism:** Rule 1 of the protocol states: "No response
without a log entry." This is the load-bearing rule. If the agent
responds without logging, the response is unrecoverable if the session
is interrupted. Making logging the *first* action (before generating
the response content) ensures the audit trail is never behind.

**Full specification:** :ref:`forge-llog-protocol`.


DD-11: WisdomBase (wb/) replaces ref/ for reference sheets (2026m03d27)
=========================================================================

**Decision:** The ``ref/`` directory holding FORGE reference sheets is
renamed to ``wb/`` (WisdomBase), adopting the naming convention from
Evolvix.

**Reasoning:** WisdomBase is an established term in the Evolvix
ecosystem for distilled generic wisdom from diverse disciplines. The
FORGE reference sheets are exactly this: distilled, discipline-specific
wisdom (category theory, dynamical systems, etc.) organized for applied
use. Adopting the WB convention:

- Aligns FORGE with Evolvix naming conventions
- Distinguishes the distilled wisdom (``wb/``) from the generators
  that produce it (``pre-forge-compiler-refsheet*.rst``)
- Signals that the sheets are a *knowledge base*, not just references

**Pre-forge scripts stay outside wb/:** The pre-forge compiler prompts
remain in the ``forge/`` root directory because they are the
*machinery* that produces WB content, not the content itself. Keeping
them separate maintains the generator/product distinction.


DD-12: Delayed counting and underscores for LLog numbering (2026m03d27)
=========================================================================

**Decision:** LLog numbering uses the HELL-compatible delayed counting
scheme (a1--a9, b10--b99, c100--c999) for sessions, rounds, and
entries. Session entry labels and filenames use underscores as
separators. Dates use ``YYYYmMMdDD`` format. The date in a session ID
is the **start date** --- if a session spans multiple days, it keeps
the date of its FORGE:IGNITE.

**Examples:**

- Session directory: ``sa1_2026m03d27/``
- Entry label: ``forge_sa1_2026m03d27_ra1_heat_ea1``
- Display: ``Forge_Sa1_2026m03d27 | Round a1 | HEAT | Entry a1``

**The ``forge_`` prefix** is a namespace. PROMY will use ``promy_``,
SISYF will use ``sisyf_``. This prevents label collisions when
multiple compilers have LLog sessions.

**Why delayed counting (not zero-padded fixed-width):**

The fixed-width scheme (``s001``, ``s002``) wastes a digit on the
placeholder ``0`` for every number below 10 and breaks sort order
across width boundaries (``s9`` < ``s10`` fails in lexicographic sort,
but ``sa9`` < ``sb10`` succeeds because ``a`` < ``b``). Delayed
counting is strictly better: no wasted digit, correct sort order within
each prefix group, and graceful scaling from single-digit to
thousand-plus entries. Using the same scheme as HELL ensures a single
counting convention across the entire matheology system.

**Why underscores (not hyphens) for session IDs:**

The initial design used hyphens for consistency with existing RST labels
(``compiler-forge``, ``forge-llog-protocol``). This was revised after
recognizing that session IDs serve a fundamentally different purpose
than structural labels.

1. **Unique-token searchability:** Search engines treat underscores as
   part of a single token. ``forge_sa1_2026m03d27_ra1_heat_ea1`` is one
   opaque string that returns exactly one Google hit --- the specific
   LLog entry being discussed. By contrast, hyphens would fragment it
   into common fragments (``forge``, ``sa1``, ``2026m03d27``) that drown
   in noise. For a system that aims to support global academic
   discussion about specific formal reasoning steps, identifier
   discoverability is more valuable than word discoverability.

2. **Consistency with VVN:** The VVN system uses underscores throughout
   (``iv_LLoL_OOv1r0p0_2026m03d27``). Session IDs live in the same
   namespace of "unique searchable identifiers." One convention, not two.

3. **The rare dateformat amplifies uniqueness:** ``2026m03d27`` is
   already unusual (most systems use ``2026m03d27``). Combined with
   ``forge_sa1_`` as prefix, the full token is effectively globally
   unique. Google becomes an index into the matheology discussion space.

4. **LaTeX escaping is handled by Sphinx:** The ``rst2latex`` pipeline
   escapes underscores automatically. No RST user ever encounters
   ``\_``. Only raw-LaTeX authors would, and they are expert enough to
   handle a backslash.

**Two label conventions coexist:**

- *Structural labels* (infrastructure, few, static): hyphens.
  ``compiler-forge``, ``forge-llog-protocol``, ``forge-aha-quickstart``.
  These are page anchors, not search targets.

- *Session entry labels* (dynamic, many, searchable): underscores.
  ``forge_sa1_2026m03d27_ra1_heat_ea1``. These are unique identifiers
  that recruit search engines as discussion indices.

The distinction is clean: structural labels name *pages*; session labels
name *moments in a reasoning process*.

**Human-readable display:** In human-facing LLog text, session IDs use
the capitalized POST convention: ``Forge_Sa1_2026m03d27``. In RST label
directives, they are lowercase: ``forge_sa1_2026m03d27``.
