The Iron Maiden — 10 Formal Tests#

Created: 2026m03d27

The Iron Maiden is the adversarial testing protocol used in Phase 3 (Grow) of the Model Forge. It applies 10 formal tests to each claim that has reached at least OO (OperatesOddly) in the StayC lifecycle.

Warn before applying the Iron Maiden to raw MM-stage ideas. MM claims need feeding (Phase 2), not the extensive testing proposed here. Gentle steering in light of the tests that will come later is desirable though and what should not be fed is ideas that are known to fail. There is a fine line between testing too much (needless discouraging) and testing too little (feeding false hope). See DD-8.

Each test produces an OKScale verdict: OK / KO / OKO / MIS, always with documented reasoning.

Test I — Consistency#

Question: Does the new claim contradict anything in the existing system?

Procedure:

  1. Check the new claim against each existing axiom ax1–ax25. Can both be true simultaneously? If you find a pair that cannot, you have a minimal inconsistent subset.

  2. Check against each theorem th1–th11. Theorems are derived from axioms, so a contradiction with a theorem implies a contradiction with the axioms — but checking theorems directly is faster because they are more specific.

  3. Check against every OTHER new claim in the model being developed. Internal consistency within the new model matters as much as consistency with the existing system.

  4. If no contradiction found: attempt to construct a model (in the logical sense — a satisfying interpretation) where all axioms old and new are simultaneously true. If you can exhibit such a model, consistency is established (for the properties checked).

What it catches: Outright contradictions, hidden inconsistencies where two claims that seem compatible at a glance actually rule each other out under certain valuations, and “stealth contradictions” where the inconsistency only emerges through a chain of 3+ axioms.

Common pitfall: Checking only pairwise consistency. Three claims can be pairwise consistent but jointly inconsistent (the triangle problem). Always check the full set.

Test II — Independence#

Question: Is the new claim already derivable from the existing system?

Procedure:

  1. Attempt to derive the claim from ax1–ax25 using the logic systems already in use (S5 modal logic, CEM, first-order predicate calculus).

  2. If derivation succeeds: the claim is a THEOREM, not an axiom. This is not a failure — it means the existing system is stronger than realized. Promote the claim to theorem status with the proof.

  3. If derivation is blocked: identify precisely where it is blocked (which step requires an assumption not available in ax1–ax25). That blocking point is what the new axiom adds.

What it catches: Redundant axioms that add nothing to the system’s deductive power. Redundancy is not fatal (Euclid’s system had redundancies), but it weakens the system’s economy and can mask hidden dependencies.

Why it matters: If a claim is derivable, adding it as an axiom creates a false impression that the system needs it. Future developers may incorrectly believe that removing it would weaken the system, creating artificial rigidity.

Test III — Necessity#

Question: What does the new claim ADD that the existing system cannot express?

Procedure:

  1. Identify the set of statements that become provable with the new claim added but are unprovable without it.

  2. If this set is empty: the claim is decorative. It may be true, but the system does not need it. Challenge the user: why include it?

  3. If this set is non-empty: characterize it. What kind of new reasoning does the claim unlock? Does it open a genuinely new domain of discourse, or does it merely add a convenient shorthand for something already expressible (but verbose)?

What it catches: Decorative axioms that look important but do no formal work. Also catches axioms that are doing work the user didn’t intend — the “surprise expressiveness” problem, where adding an axiom inadvertently makes the system strong enough to prove things the user would rather leave open.

Test IV — Modal Soundness (S5)#

Question: Does the claim hold across all accessible possible worlds in the S5 framework?

Procedure:

  1. Is the claim intended as necessarily true (□) or contingently true? If necessary: it must hold in every accessible world. If contingent: it holds in at least one but not all.

  2. Check that necessity and possibility operators (□, ◇) are used correctly. Common errors:

    • Claiming □ P when P is only contingently true.

    • Using ◇ P when the claim requires □ P.

    • Accidentally collapsing the necessary/contingent distinction by making a contingent truth follow necessarily from the axioms.

  3. Check the accessibility relation. S5 uses an equivalence relation (reflexive, symmetric, transitive), meaning every possible world can “see” every other. Does the claim rely on a weaker accessibility relation (which would require a different modal system)?

What it catches: Claims that are true in the actual world but not in all possible worlds (contingent truths masquerading as necessary truths), and the reverse (unnecessary restrictions on possible worlds).

Test V — Mereological Coherence (CEM)#

Question: Are part-whole relationships respected?

Procedure:

  1. Does the claim introduce entities whose mereological status is undefined? Every entity in the system should have a defined relationship to the mereological whole (Reality, in this system).

  2. Is there circularity in containment? (A is part of B, B is part of A — only legitimate if A = B.)

  3. Does the claim respect the existing mereological structure? In this system, the key relationships are:

    • W ≤ G (World is part of God — panentheism axiom ax1)

    • W < G (World is a proper part of God — axiom ax2)

    • Parthood is transitive, reflexive, antisymmetric.

  4. Does the new claim create mereological “orphans” — entities that exist in the system but have no defined part-whole relationship to anything else?

What it catches: Containment contradictions (X is inside Y but Y is also inside X), undefined entities floating outside the mereological hierarchy, and violations of the axioms’ core claim about the God-World part-whole relationship.

Test VI — Game-Theoretic Stability#

Question: If the claim describes agents, incentives, or cooperation mechanisms, is the described equilibrium actually stable?

Procedure:

  1. Identify the game: who are the players, what are their strategy sets, what are the payoffs?

  2. Is the described outcome a Nash equilibrium? (No player can improve by unilaterally changing strategy.)

  3. Is it stable under iterated play? A one-shot Nash equilibrium can collapse when the game is repeated (Folk Theorem). Check both finite and infinite horizon.

  4. Can a rational defector exploit the mechanism? If the claim describes a cooperation mechanism (e.g., Jubilee-System cycles), check whether a player who pretends to cooperate but defects at the optimal moment can gain at others’ expense.

  5. Does the mechanism satisfy incentive compatibility? (Is truthful behavior optimal, or can players gain by misrepresenting their preferences?)

What it catches: Utopian mechanisms that assume cooperation without providing incentives for it. Social choice impossibilities (Arrow, Gibbard-Satterthwaite). Mechanisms vulnerable to strategic manipulation or free-riding.

Test VII — Computability and Decidability#

Question: Can the claim’s truth be checked by a finite procedure?

Procedure:

  1. Is the claim decidable? Can an algorithm determine its truth in finite time for any given input?

  2. If undecidable: is the undecidability acknowledged and bounded? Many interesting claims are undecidable in general but decidable for specific cases. The claim should specify which cases are intended.

  3. Does the claim accidentally require solving a halting-problem equivalent? This is more common than expected: claims about “all possible behaviors of a system” or “eventual convergence to a state” can hide undecidable quantification.

  4. If the claim involves infinite structures (all possible worlds, all future time steps, all members of humanity): is the quantification well-founded? Does it avoid the paradoxes of unrestricted quantification?

What it catches: Claims that sound meaningful but cannot be checked even in principle. Claims that require infinite verification. Hidden halting-problem equivalents. Poorly bounded universal quantification.

Test VIII — Real-World Grounding#

Question: Can the claim be connected to observable phenomena?

Procedure:

  1. Does the claim make predictions, even in principle? A claim that has no observable consequences is unfalsifiable — not necessarily wrong, but worth flagging.

  2. Would a working scientist (physicist, economist, biologist, sociologist) find the claim meaningful? Or would they say “this is not even wrong” (Pauli)?

  3. Does the claim survive the agnostic-scientist critique? (See the HELL adversarial landscape for this position: someone who accepts empirical reasoning but rejects the axiomatic starting assumptions about purpose and consciousness.)

  4. Are there historical or contemporary examples that illustrate the claim’s content? Formal claims grounded in real examples are stronger than purely abstract ones.

What it catches: Vacuous claims that are technically consistent but say nothing about the world. Claims that are meaningful only within the formal system and have no purchase outside it. Also catches the opposite: claims that are too specific to particular real-world conditions and lack the generality needed for an axiom.

Test IX — Cross-Model Coherence#

Question: How does the new claim relate to analogous structures in the existing models (PET, JUB)?

Procedure:

  1. Is there an alignment echo? (The same underlying concept appears in the new model as in PET or JUB, formalized differently.) If so: is the echo intentional? Is it structural (functorial) or accidental?

  2. Is there a genuine divergence? (The new model makes a claim that contradicts or is absent from PET/JUB.) If so: is the divergence justified? Does the new model explicitly acknowledge where it parts ways with existing models?

  3. Does the new claim interact with the existing cross-model infrastructure? (5D link naming, BEST Names, PoR field registry.) Can the claim be labeled and cross-referenced within the existing architecture, or does it require architectural extension?

  4. If the new model introduces entities or relationships not present in PET/JUB: do they enrich the system or fragment it? A new model that shares no structure with existing models is a silo, not an extension.

What it catches: Unintentional contradictions between models. Missed alignment echoes (the same insight rediscovered under a different name). Architectural incompatibilities that would prevent the new model from being compiled by SISYF.

Test X — Known-Attack Resilience#

Question: Does the new claim fall to any of the existing 33 con objections, or does it open a new attack surface?

Procedure:

  1. Scan all 33 existing con findings. For each: does the attack apply to the new claim? Many con findings are specific to JUB mechanisms, but some (e.g., the agnostic-scientist position, the Arrow impossibility, the knowledge problem) are general enough to apply to any model in the system.

  2. If an existing con applies: check whether the corresponding pro defense also applies. If yes: the new claim inherits both the attack and the defense. If the defense does NOT transfer: the new claim is vulnerable where JUB is not.

  3. Does the new claim open a NEW attack surface not covered by any existing con? If so: draft the con entry (what the attack would look like) and assess whether a pro defense exists. This becomes input for the Reap phase’s proposed HELL entries.

What it catches: Claims that unknowingly repeat mistakes already addressed in the HELL landscape. Claims that are vulnerable to known attacks without the known defenses. And most valuably: genuinely new attack surfaces that the existing system has not yet encountered.

Note: This test is most powerful in the 1M prompt where all 66 HELL findings are loaded. In the 200K prompt (where only the HELL index is loaded), this test operates at reduced power and should be flagged as OKO for any claim where the specific con/pro content would be needed to give a definitive verdict.