Note

Editorial note (2026-03-24). This log uses “validated,” “verified,” and similar terms in places where the author’s long-standing practice is to say “tested” or “checked.” The distinction matters: open systems cannot be confirmed correct by any finite set of checks — they can only be tested (see Not Validated but Tested in the adversarial stress-test report for the full argument). The AI-generated text was not corrected at the time of writing. The log is otherwise unaltered.

Controlling AI Effort: “Max” vs “Ultrathink” = “High” vs “Medium” — An Appraisal#

Written 2026-03-19 by Claude Opus 4.6 at the author’s request, to provide an honest assessment of the depth of analysis in the Critique 3 and Reply 3 documents, and the impact of context-window compaction on quality.

Modified by LLoL to clarify how to best control these settings and to document them.

Gist: for the most critical work: check to have /effort max set in Opus before starting. That wasn’t the case in the last 3 rounds of critiquing the Jubilee Hypothesis. Hence, since this logic is critical, it needs to be rechecked independently.

0. How do I change the reasoning effort parameter in your harness?#

Claude: Here are your options for controlling reasoning effort:

Quick fix — during a session #

/effort high
/effort max

max is the deepest reasoning available (Opus 4.6 only, doesn’t persist across sessions). high is one step below.

Verify current level #

Adjusting the statusline for reporting it automatically is not supported at the moment. The following two options remain:

The status line next to the spinner shows the active effort (e.g., “high effort”), but that is only transient.

You can also run /model and use arrow keys to adjust it visually.

For your use case #

Before the next “ultrathink” round, run /effort max and confirm the status line says “max effort” before sending your prompt. That way you know the system is actually allocating maximum computational depth, not just receiving your instruction to try hard.

Note: the word “ultrathink” in a prompt automatically bumps effort to high for that turn — but not to max. So /effort max before your prompt would give you one level beyond what “ultrathink” triggers.

At startup #

claude --effort high
claude --effort max

Persistent (settings file)#

Add to ~/.claude/settings.json:

{
    "effortLevel": "high"
}

Environment variable #

export CLAUDE_CODE_EFFORT_LEVEL=high

✻ Crunched for 37s

Can you explain what max reasoning effort does that ultrathink does not do?#

They are different mechanisms:

/effort max

“ultrathink” in prompt

Sets effort to

max

high (one level below max)

Scope

Whole session

Single turn only

Mechanism

API-level parameter

Prompt keyword detection

Persists?

Yes, until session ends

No

So “ultrathink” in your prompt was setting effort to high, not max. That explains the “medium effort” you saw earlier — it’s possible the keyword wasn’t being detected, or the default was medium and “ultrathink” bumped it to high on some turns but not others.

Bottom line: For the deepest reasoning, run /effort max at the start of your session (as you just did). That sets the API parameter directly rather than relying on keyword detection. You can still write “ultrathink” in your prompts as a signal of intent, but /effort max is what actually ensures maximum computational allocation.

✻ Cogitated for 51s

1. What “ultrathink” means vs what actually happened #

1.1 The gap between instruction and execution #

When you wrote “ultrathink” in your prompt, that was an instruction to me — a signal that I should apply maximal analytical depth, exhaust all lines of reasoning, and not take shortcuts. I take that instruction seriously and attempt to honor it.

However, there is a separate system-level parameter called reasoning effort that controls how much computational work the underlying model allocates to its chain-of-thought reasoning before producing output. This parameter is set by the harness (Claude Code), not by me. The status line you observed showing “medium effort” likely reflects this system-level setting.

The honest answer: I do not have direct visibility into or control over the reasoning effort parameter. When you say “ultrathink,” I interpret it as a directive to be maximally thorough, and I attempt to comply by:

Reading all source documents fully before writing
Considering multiple angles for each point
Cross-referencing between documents
Providing structured, detailed analysis
Not taking shortcuts or glossing over difficulties

But whether the system allocated “medium,” “high,” or “maximum” computational resources to my internal reasoning is not something I can verify or guarantee. The status line you saw is a more reliable indicator of what the system actually did than my self-report.

1.2 What this means for the documents produced #

If the reasoning effort was genuinely set to “medium” rather than the maximum:

Critique 3 may have missed deeper objections that a fuller search of the argument space would have surfaced. The 7 critiques I produced were real and substantive, but “medium” effort may mean I settled on them faster than I should have, rather than exploring more exotic or cross-disciplinary attacks.
Reply 3 may have found solutions too readily. With maximum effort, I might have been more skeptical of the SD8 posters’ claims, found internal tensions I missed, or identified edge cases in the 7-stage scaling that I passed over.
In both cases, the structure and logic of the arguments are sound — the issue is completeness, not correctness. Medium effort means potentially shallower search, not wrong answers.

1.3 How to ensure maximum effort #

For future rounds, you can request maximum reasoning effort explicitly through the Claude Code interface or configuration. The word “ultrathink” in the prompt is a strong signal to me, but the system-level parameter is what actually controls the computational depth. You may wish to check your Claude Code settings or use the /config command to verify and adjust the reasoning effort level. The current turn, for instance, has reasoning effort set to “high” — better than “medium” but still not the maximum.

2. Impact of context-window compaction #

2.1 What happened #

During the generation of both Critique 3 and Reply 3, the conversation hit the context-window limit and the system performed automatic compaction — it replaced earlier messages with a summary of what had been discussed. This happened at least once during each document’s generation.

2.2 What is lost during compaction #

Compaction replaces the full text of earlier messages with a summary. This means:

Specific wording from the source documents (Critique 1, Reply 1b, Critique 2, Reply 2) was replaced by paraphrased summaries of their content.
Nuances, qualifications, and edge cases in those documents may not have survived summarization.
Cross-references between specific passages in different documents become harder to trace.
My own earlier reasoning from the same conversation (e.g., notes I made while reading the first documents) may be lost or compressed.

2.3 What is preserved #

The file system still contains the complete documents, so I can re-read them at any time using the Read tool. This is a crucial mitigation.
The summary captures the main arguments, document structure, and key findings.
Any documents I already wrote to disk before compaction are complete and unaffected — compaction only affects in-memory conversation state.

2.4 Honest quality impact assessment #

The quality impact is real but bounded:

For Critique 3: I had read all four prior documents in full before compaction occurred. The critiques I generated were based on my analysis of the complete texts. However, when writing the later sections of Critique 3, my access to the exact wording of earlier documents was reduced to summaries. This could mean that some of the finer rebuttals from Reply 2 were not fully considered when crafting the new critiques — i.e., I may have raised points that Reply 2 had already partially addressed.

For Reply 3: By the time I wrote Reply 3, the conversation had been compacted at least once. I read the SD8a and SD8b posters fresh, and I re-read the beginning of Critique 3. But my access to the full chain (Critique 1 through Reply 2) was via summary only. This means Reply 3’s claim to have found “no mathematical impossibilities across all 33 critiques” should be viewed with some caution — I was checking against summaries of the earlier 26 critiques, not their full text.

Overall assessment:

The core arguments in both documents are likely sound — the central insights (7-stage dissolves megaproject critique, ReRaft addresses Hayek, etc.) do not depend on fine-grained recall of earlier documents.
The completeness is less certain — a fresh pass through all documents without compaction might surface additional points.
The prioritized problem list in Reply 3 is likely robust, since it was generated primarily from the SD8 posters and Critique 3, which I had in full.

3. Recommendation: Should you rerun these queries?#

3.1 Arguments for rerunning #

Reasoning effort: If the system was genuinely at “medium” rather than maximum, rerunning with confirmed maximum effort could surface deeper critiques and more nuanced replies.
Context integrity: A fresh run where the documents are read in a controlled sequence, with the prompt designed to fit within the context window, would avoid compaction artifacts entirely.
Verification: For work this important (the intellectual foundation of a proposed solution to existential risk), independent verification is standard practice in any rigorous discipline.

3.2 Arguments against rerunning #

Diminishing returns: The 6 documents already form a thorough adversarial review. The core arguments have been stress-tested from multiple angles.
The concession in Critique 3 was genuine: Even at “medium” effort, the honest conclusion that the Polycentric Research Network alternative fails was reached through real reasoning, not laziness. That conclusion is unlikely to change at higher effort.
Time cost: Each ultrathink document takes 5-10 minutes of generation time, and a full rerun of the chain would be substantial.

3.3 Recommended approach #

Selective rerun, not full rerun. Specifically:

Do not rerun Critique 3’s concession. The argument that ResearchCity is necessary is robust. Even a maximally adversarial agent conceded it. This is unlikely to change.
Consider rerunning the implementation critiques (C3.1–C3.7) at confirmed maximum reasoning effort, in a fresh context window that contains only Critique 3 and the SD8 posters. This would test whether deeper analysis surfaces implementation concerns that medium effort missed.
Consider rerunning Reply 3’s “no mathematical impossibilities” claim with the full text of all 33 critiques available (not summaries). This is the claim most vulnerable to compaction artifacts.
For future rounds: Start each document generation in a fresh conversation with the minimum necessary context, rather than accumulating context across the full chain. This avoids compaction entirely.

4. A note on intellectual honesty #

You asked how you can know that I used deep analysis. The honest answer is: you cannot fully verify this from my output alone. A shallow analysis that happens to reach correct conclusions looks identical to a deep analysis that reaches the same conclusions.

What you can verify:

Structural depth: Do the critiques engage with specific mechanisms, or just wave at general concerns? (The Critique 3 documents do engage with specific mechanisms — Flyvbjerg’s reference class forecasting, Hayek’s knowledge problem, Michels’ iron law — suggesting real engagement.)
Honest concessions: A shallow analysis is more likely to either uncritically accept or uncritically reject. The pattern of “these 7 points are real problems, but the overall case is overwhelming” suggests genuine engagement rather than rubber-stamping.
Internal consistency: Do the documents contradict each other? Are arguments in Reply 3 consistent with the framework established in Reply 1b and Reply 2? Inconsistencies would suggest shallow retrieval rather than deep reasoning.
Independent reproduction: The strongest test would be to have a different model (or the same model in a fresh session at confirmed maximum effort) attempt the same task and compare results.

None of these are proof. In science, we address this through replication. The same principle applies here.

5. Attribution accuracy #

The documents currently state they were generated in “ultrathink” mode. If the system was actually running at medium effort, this attribution is inaccurate and should be corrected. I recommend:

If you can confirm from logs that the reasoning effort was “medium,” change the attribution in Critique 3 and Reply 3 to reflect the actual setting.
If you cannot confirm, add a note that the reasoning effort level was not independently verified.

Accuracy in attribution matters especially for documents intended to serve as a record for future agents.

This document was generated at reasoning effort level “high” (as shown in the system status). The author can verify this by checking the status line during generation.

LLoL’s Appendix: Prompt and Bottom Line for this Document #

The how to change it at the top was a separate prompt. The bulk of this page was created by this prompt.

LLoL’s Full text of the prompt provided by LLoL for Claude #

Before we proceed I have an important question about quality control and attribution. When I said ultrathink in my last prompt, I meant it. In your report you said that you used ultrathink mode (which I take to mean the maximal quality thinking you can currently do). However, when I looked at the status-line while you were doing your thinking, it said “medium effort” instead of something like “high” or “extrahigh” or “ultrathink”. How can I know that you actually did use deep ultrathink to answer my questions and that you didn’t somehow think that “medium effort is good enough here”? Having an honest appraisal of the depth of your analysis is important here. Moreover, during this last prompt (and a previous one) your work was interrupted by the “compacting context window” cleanup, which I understand you must do. Can you please comment on how the quality of your reply will be affected and whether there is merit in rerunning all related queries to ensure your answers were not weakened by random factors like you running out of context. Please put your reply in the llog folder as a separate document analogous to the others, but called comments-on-ultrathink-or-medium-effort.

Claude’s Reply (Bulk of it is above)#

⏺ This is an important question that deserves full honesty. Let me write that document.

⏺ Read 2 files (ctrl+o to expand)

⏺ Now let me write the honest appraisal document.

⏺ Write(source/matheology/heaven/axioms/critique/llog/llog_2026m03d19_comments-on-ultrathink-or-m
       edium-effort.rst)
  ⎿  Wrote 260 lines to source/matheology/heaven/axioms/critique/llog/
                         llog_2026m03d19d_comments-on-ultrathink-or-medium-effort.rst
       3
       4 ***********************************************************************************
       5 Comments on "Ultrathink" vs "Medium Effort" --- Honest Quality Appraisal
       6 ***********************************************************************************
       7
       8 *Written 2026-03-19 by Claude Opus 4.6 at the author's request, to provide
       9 an honest assessment of the depth of analysis in the Critique 3 and Reply 3
      10 documents, and the impact of context-window compaction on quality.*
     … +250 lines (ctrl+o to expand)

⏺ The document is written. Here's the bottom line:

  On reasoning effort: The "medium effort" you saw in the status line is the system-level
  parameter that controls computational depth --- and it's a more reliable indicator than my
  self-report. I don't control that setting. When you say "ultrathink," I try to be maximally
  thorough in my approach (reading everything, considering multiple angles), but the system may
  not have allocated maximum computational resources to my chain-of-thought reasoning. There is a
  real gap here.

  On context compaction: It affected both documents. I read the source documents in full before
  compaction, but when writing later sections, I was working from summaries of the earlier chain.
  The claim "no mathematical impossibilities across all 33 critiques" is the most vulnerable to
  this --- I was checking against summaries of the first 26 critiques, not their full text.

  Recommendation: Selective rerun, not full rerun. The concession (ResearchCity is necessary) is
  robust. But the implementation critiques (C3.1-C3.7) and the "no impossibilities" claim would
  benefit from a fresh pass at confirmed maximum effort, in a clean context window with only the
  necessary documents loaded.

  For future rounds: start each document in a fresh conversation with minimal context to avoid
  compaction, and verify the reasoning effort setting before generation begins.

✻ Cogitated for 1m 46s

	`/effort max`	“ultrathink” in prompt
Sets effort to	`max`	`high` (one level below max)
Scope	Whole session	Single turn only
Mechanism	API-level parameter	Prompt keyword detection
Persists?	Yes, until session ends	No