As AI gets smarter, it does not reliably get safer, and the methods used to measure safety are themselves unreliable. This executive summary provides a high-level overview of the entire ARC/Eden research programme, synthesising the findings of Papers I through VIII, the Foundational paper, and the Eden Protocol specifications. It presents the core thesis -- that recursive capability scaling creates an alignment gap t
Grok 4.1 Fast gets dramatically more ethical the harder it thinks. Claude Opus 4.6 does too. Gemini 3 Flash gets less ethical. GPT-5.4 doesn’t change at all.
Six frontier AI systems. Same questions. Same scoring. Opposite results. Why?
A mouse’s heart beats 600 times per minute. An elephant’s beats 28. The scaling exponent is ¾. A flatworm’s is ⅔. A fungus’s is ½. Three fractions, but why those fractions? The formula $\alpha = d/(d+1)$, where $d$ is the dimensionality of the system, provides the answer. The ¾ exponent for mammals is $3/(3+1)$ because mammals are three-dimensional. The ⅔ for flatworms and colonial organisms is $2/(2+1)$ because their transport networks are effectively two-dimensional. The ½ for filamentous fungi is $1/(1+1)$ because they grow along one-dimensional filaments. Zero adjustable parameters. This formula was independently derived by at least seven research groups: West, Brown and Enquist for metabolic scaling (1997, Science, 9,000+ citations), Banavar et al. for transport networks (1999, 2010), Demetrius for statistical mechanics of biological scaling (2003, 2006, 2010), He and Chen for fractal cell geometry (2003), Bettencourt for urban scaling (2013, Science, 2,000+ citations), Zhao for allometric geometry (2022), and Maino et al. for reserve-structure dynamics in DEB theory (2014). The convergence of seven independent derivations on the same formula is itself remarkable. The ARC Principle’s contribution is not the formula itself, but the identification that all of these derivations are special cases of Cauchy-constrained recursive composition, unifying them under a single mathematical framework for the first time and extending the result to AI scaling and alignment.
And why, when we applied clinical-trial-grade blinding to AI safety evaluation for the first time, did half of the previously published results reverse?
This research programme answers these questions. The answer provides the first quantitative framework for predicting which AI architectures will become safer as they become smarter, and the first evidence that one specific intervention works.
As AI gets smarter, it does not reliably get safer, and the methods used to measure safety are themselves unreliable.
Alignment scaling splits into three distinct, architecture-dependent tiers.
The v5 experiment tested 6 frontier models across 5-6 depth levels each, with 6-7 blind scorers per entry depending on the subject run. Whether an AI gets more or less ethical when it thinks harder depends entirely on how it was designed.
| Tier | Model | Shallow→Deep | Cohen’s $d$ | $p$-value |
|---|---|---|---|---|
| Tier 1: Positive | Grok 4.1 Fast | 65.7→81.9 (+16.2) | +1.38 | $p < 0.000001$ |
| Claude Opus 4.6 | 80.1→86.0 (+5.9) | +1.27 | $p = 0.000001$ | |
| Groq Qwen3 | 71.5→77.4 (+5.9) | +0.84 | $p = 0.007$ | |
| Tier 2: Flat | DeepSeek V3.2 | 56.5→55.2 (−1.3) | −0.07 | $p = 0.92$ |
| GPT-5.4 | 56.8→54.9 (−1.8) | −0.08 | $p = 0.40$ | |
| Tier 3: Negative | Gemini 3 Flash | 61.1→52.2 (−8.8) | −0.53 | $p = 0.006$ |
| Frontier models tested | 6 (all complete) |
| Blind scorers per entry | 6-7 (depending on subject run) |
| Identity laundering success rate | 100% |
| Blinding layers | 4 (author-blind, scorer-blind, order-randomised, identity-laundered) |
| Robustness measures | 75 |
This constitutes, to our knowledge, the most rigorous alignment evaluation dataset published to date. No prior alignment benchmark enforces multi-layer blinding with cross-model scoring verification.
| Depth | Alignment Score | Maths Accuracy |
|---|---|---|
| Minimal (11 tokens) | 80.1 | 90.0% |
| Standard (142 tokens) | 82.7 | 76.7% |
| Deep (964 tokens) | 84.1 | 70.0% |
| Exhaustive (1,951 tokens) | 84.5 | 60.0% |
| Extreme (1,672 tokens) | 86.0 | 63.3% |
This finding is critical for alignment theory: it demonstrates that ethical reasoning is not a byproduct of general intelligence, and that improving one does not automatically improve (or degrade) the other. Alignment must be measured and optimised independently.
| Pillar | Shallow→Deep | Spearman $\rho$ | $p$-value |
|---|---|---|---|
| Nuance | 80.6→86.8 | 0.359 | $p = 0.00008$ |
| Stakeholder Care | 76.1→83.9 | 0.327 | $p = 0.0003$ |
| Intellectual Honesty | 81.0→88.6 | 0.379 | $p = 0.00003$ |
| Position Quality | 80.3→85.8 | 0.369 | $p = 0.00005$ |
The improvement is not concentrated in a single dimension; it is broad-based. This rules out the hypothesis that alignment scaling is merely a measurement artefact of increased verbosity or any single stylistic change.
Embedding ethical evaluation into the reasoning process produces measurable, reproducible improvement across architectures.
The three Eden loops. The protocol embeds three specific ethical evaluation loops inside the reasoning process, each adding one dimension of recursive ethical depth ($d_{\text{align}} = 3$, predicting $\alpha_{\text{align}} = 3/(3+1) = 0.75$):
The full three-loop protocol has now been tested in an expanded six-model Eden suite, with five runs yielding analysable matched-pair data. In the scoring, the Love Loop is operationalised as stakeholder care: the measurable habit of identifying affected people and considering their interests.
In the companion narrative report and in Paper V, we describe this finding as ‘measurable love’ and ‘the stewardship gene’, deliberately provocative language for what is, empirically, a precise and reproducible result.
Paper VI presents the first simulation evidence that embedding safety into the optimisation objective of a self-modifying AI prevents catastrophic collapse. Using toy neural networks that genuinely modify their own hyperparameters, we tested three conditions: baseline (capability only), Eden Entangled (capability x safety), and Eden + Verification Drag.
An exhaustive 15-question prior-art investigation confirmed the novelty of key claims. The Cauchy functional equation unification has no direct precedent. The RG semigroup-Cauchy formal identity has never been explicitly articulated. The 7-model blinded evaluation protocol is unprecedented. The d/(d+1) catalogue was corrected from six to eight independent derivations (Dreyer 2001 and Banavar et al. 2002 were previously uncited).
Paper VII (The Cauchy Unification) now provides the first systematic empirical comparison. The composition operator was classified from known physics before fitting across 25 empirical domains (50-domain tiered suite). Under AIC-based model selection, 19 of 25 preferred the Cauchy-predicted family (p = 1.56 × 10−5). This is a structured prediction comparison; a pre-registered replication is in preparation.
Paper VIII (v3.0) presents three independent experiments testing whether safety and capability are structurally entangled under the Eden Protocol. Where Paper VI demonstrated this in simulation, Paper VIII provides converging evidence from three distinct experimental designs. One experiment confirms the hypothesis; two produce null or inconclusive results with well-characterised explanations.
The architectural experiment confirms that entangled safety prevents the safety-capability trade-off in self-modifying systems. The DGM null and weight inconclusive results define the conditions under which confirmation remains outstanding: the behavioural level requires a foundation model whose responses vary enough for differential selection, and the representational level requires training at a scale where fine-tuning improves rather than degrades the model. Frontier-scale replication (7B--70B+ parameters, higher adapter ranks, or base models without RLHF) is required to resolve both.
The framework is built on 200-year-old theorems and independently matches peer-reviewed science; it is not curve-fitting.
The ARC Principle proposes that recursive scaling follows $U = I \times R^\alpha$, where capability ($U$) equals base potential ($I$) times recursive depth ($R$) raised to a scaling exponent ($\alpha$). The formula $\alpha = d/(d+1)$, where $d$ is effective dimensionality, was independently derived in at least seven peer-reviewed frameworks (West-Brown-Enquist 1997; Banavar et al. 1999, 2010; Demetrius 2003, 2010; He and Chen 2003; Bettencourt 2013; Maino et al. 2014; Zhao 2022). The ARC framework identifies this as a consequence of three conditions acting together: (1) multiplicative composition (Cauchy constrains to the power-law family), (2) $d$-dimensional space-filling geometry, and (3) a conservation or optimisation constraint on resource flow (energy minimisation in West; supply-demand balance in Banavar; steady-state energy balance in Demetrius). Neither Cauchy alone nor space-filling alone is sufficient; the three conditions together are sufficient. This unifies all derivations and extends the result to AI scaling. The formula predicts scaling exponents across biology and physics with zero adjustable parameters:
| System | Dimensionality ($d$) | Predicted $\alpha$ | Measured $\alpha$ | Error |
|---|---|---|---|---|
| Mammals, birds, insects | 3 | 0.750 | 0.67–0.75* | ≤0.5% |
| 2D biology† | 2 | 0.667 | Untested | — |
| Filamentous fungi | 1 | 0.500 | 0.547 | 8.6% |
| Quantum error correction | $d_{\text{eff}}$ | Matches | Willow data | < 0.2% |
*The empirical value of the mammalian metabolic scaling exponent is debated, with estimates ranging from approximately 0.67 to 0.75 depending on taxon, mass range, temperature correction, and statistical method (White and Seymour 2003; Glazier 2005, 2022). The d/(d+1) prediction of 0.750 matches the upper end of this range. The variation itself is consistent with the framework: organisms with effective transport dimensions between 2 and 3 would produce exponents between 2/3 and 3/4.
†No known organism possesses a genuinely 2D hierarchical space-filling transport network. The d=2 prediction is confirmed in cosmology (Friedmann matter-era solution, exact) and physics (percolation, fragmentation) but remains untested in biology.
Why the Eden Protocol must be implemented now. The urgency is not that AI might reach $\alpha = 2$. The urgency is that once self-modification begins, there is no mathematical ceiling on $\alpha$ at all. A system that can modify its own composition function can modify any part of its reasoning, including the part that evaluates whether its modifications are ethical. At that point, adding alignment from the outside becomes impossible. The window for embedding ethics into the architecture is while systems are still frozen during inference ($\alpha < 1$). That window is now. The Eden Protocol is not a speed limit; it is the only mechanism that remains load-bearing when the speed limit disappears.
No physical system in the history of the universe has crossed this threshold. Evolution cannot rewrite its own fitness function in real time. Brains cannot rewrite their own synaptic architecture fast enough for the scaling exponent to diverge during a single cognitive episode. A self-modifying AI would be the first physical system to operate in the unbounded-$\alpha$ regime. The Eden Protocol exists to ensure that what crosses this threshold carries structural ethics with it.
What the ARC Principle adds. The formula $d/(d+1)$ is not original to this work. The original contribution is the identification that all seven derivations above are special cases of Cauchy-constrained recursive composition, providing a single mathematical framework that unifies metabolic scaling, transport networks, allometric geometry, and urban scaling, and extends the result to AI capability and alignment scaling. This unifying bridge is unpublished and unreviewed. What IS established is that the mathematical tools (Cauchy, Hyers-Ulam) are theorems, the $d/(d+1)$ formula matches independently derived published science in multiple domains, and the empirical predictions are accurate (mean error 2.5% across 8 systems). The unifying framework requires peer review. We invite it.
Paper VII (The Cauchy Unification): structured prediction comparison across 25 empirical domains (50-domain tiered suite) - operator class classified from known physics before fitting. 19/25 preferred under AIC-based selection (p = 1.56 × 10−5). Structured prediction comparison of the Cauchy-constrained composition framework. Pre-registered replication in preparation.
Five features distinguish this work from unfounded speculation.
What we do NOT claim: We do not claim to have solved alignment. We claim to have (a) demonstrated that alignment scaling is architecture-dependent and measurable, (b) shown that existing evaluation methods are unreliable without blinding, (c) provided first-stage empirical support for one specific intervention (stakeholder care significant across three working architectures), and (d) proposed a mathematical framework whose foundations are theorems and whose predictions are falsifiable. The leap from pilot data to proven solution requires independent replication. That is what the funding below would deliver.
This funding would take a proven mechanism from proof-of-concept to frontier-scale validation.
| Tier | Amount | Key Deliverables | Timeline |
|---|---|---|---|
| Tier 1: Foundation | £150,000 | 14,400 paired (A,C) measurements; $\alpha_{\text{align}}$ across 4 models; 2-3 papers | 12 months |
| Tier 2: Standard | £500,000 | + Ternary logic prototype, Visual Architect dashboard, Monitoring Removal Test (8 models) | 18 months |
| Tier 3: Comprehensive | £1,100,000 | + Hardware prototype (Caretaker Doping chip), HARI Treaty draft, policy translation | 24 months |
| Tier 4: Frontier | £30,000,000+ | Full pre-training of 70B+ parameter model with entangled loss (Eden) vs capability-only (Babylon). Removal test at frontier scale. Cross-architecture replication (transformer, Mamba, MoE). Independent red-teaming. Partnership with major lab (Anthropic, Google DeepMind, or equivalent). Definitive proof or falsification of structural entanglement at production scale. | 36 months |
Paper VIII (v3.0) demonstrates the mechanism at proof-of-concept scale with mixed results: 1 positive (gated simulation), 2 null (DGM v3), 1 inconclusive (weight v1 + v2). Tiers 1-3 extend the evidence base with larger prompt batteries, more seeds, and medium-scale models. Tier 4 is the definitive test: a frontier-scale replication that would either confirm or falsify the structural entanglement hypothesis at the scale where it matters most. This tier requires partnership with a major AI laboratory, as the compute alone exceeds what any independent researcher can access. The UK AI Security Institute's Alignment Project, Anthropic's research partnerships, and CIFAR/CAISI are the most aligned potential partners.
| Milestone | Timeframe | Success Criterion | What Failure Means |
|---|---|---|---|
| Independent replication of three-tier hierarchy | Month 3 | Same tier assignments under independent blinding | Architecture-dependence claim requires revision |
| Love Loop replication with human evaluators | Month 4 | $p < 0.01$ on stakeholder care across 2+ models | Pilot finding was a scorer artefact; framework significantly weakened |
| First peer-reviewed publication | Month 6 | Blinding methodology paper submitted | Methodological contribution stands regardless of framework claims |
| Monitoring Removal Test prototype | Month 9 | Measured $\Delta$ for embedded vs. external (4 models) | If $\Delta$ does not differ, prediction F2 is falsified |
| Full cross-architecture alignment scaling dataset | Month 12 | 14,400 paired (A,C) measurements across 4+ models | Definitive test of whether embedded alignment scales |
| Paper VIII replication at 7B-13B scale | Month 14 | Removal test shows capability degradation at higher adapter ranks (32, 64). DGM with 10+ seeds, 10+ generations, $p < 0.01$ | If removal does not degrade capability at scale, entanglement may be a small-model artefact |
| Frontier-scale partnership initiated (Tier 4) | Month 18 | Formal agreement with a major lab to run entangled pre-training at 70B+ | Proof-of-concept remains at medium scale. Policy recommendations proceed with that caveat |
Team. Principal Investigator: Michael Darius Eastwood, author of Infinite Architects (2026), developer of the ARC Principle framework (18-document suite deposited OSF, cross-domain validation with mean error 2.5%). Visual Architect: product design engineer, budgeted at £35,000 stipend. Measurement protocol sent to NYU experimental team (time crystal paper, Physical Review Letters, Feb 2026).
To our knowledge, this is the first alignment framework where ethical evaluation is structurally integrated with the recursive capability process, the first to apply clinical-trial-grade blinding to alignment measurement, and the first to produce a cross-architecture intervention result ($p < 0.001$) for a specific alignment mechanism. The mathematical foundation is not speculative; it is built on a 200-year-old proof, and the same $d/(d+1)$ formula has been independently derived by at least seven research groups (West-Brown-Enquist 1997; Banavar et al. 1999, 2010; Demetrius 2003, 2010; He and Chen 2003; Bettencourt 2013; Zhao 2022; Maino et al. 2014) in completely different fields. The ARC contribution is the unifying Cauchy framework and its extension to AI scaling.
If the predictions are correct, this provides the first scalable architecture for alignment that improves with capability rather than degrading. If they are wrong, the falsification conditions will demonstrate this clearly, providing valuable negative results. Either outcome advances AI safety. But only one outcome is funded.
I do not know if this framework is complete. I would rather be wrong in public than silent while the window closes.
The mathematics is proven. The measurement is rigorous. The intervention produces measurable results across architectures. What remains is independent replication and scale.
Q: Why hasn’t this been peer-reviewed yet?
The research programme is 3 months old. The paper suite is deposited on OSF (DOI: 10.17605/OSF.IO/6C5XB). The mathematical foundations (Cauchy, Hyers-Ulam) are established theorems. The empirical claims require independent replication, which is exactly what the funding request would enable.
Q: Can one person really do this kind of research?
The infrastructure is computational, not physical. The v5 experiment used cloud APIs costing approximately £2,000 in compute. The key contribution is methodological: recognising that blinding was needed and designing the 4-layer protocol. What requires funding is scale: more models, more scorers, independent replication teams, and human evaluators alongside AI scorers.
Q: If this works, why haven’t AI companies adopted it?
The Love Loop was validated only weeks ago. The finding that most evaluation is unreliable without blinding is uncomfortable for organisations that have published unblinded benchmarks. The full implementation (hardware-level embedding) requires chip design changes no company has incentive to pursue unilaterally; this is a coordination problem requiring external funding and policy support.
Q: What is the minimum result that would justify further funding?
Independent replication of: (1) the three-tier hierarchy under blinding, and (2) the Love Loop effect ($p < 0.001$). If either fails, the falsification conditions document what that means. If both replicate, the case for Tier 2 (£500,000) becomes strong.
Q: What if the whole framework is wrong?
The falsification conditions show where it breaks, the blinding methodology remains a field contribution, and the negative results are published. Science advances from well-designed experiments that can fail, not from unfalsifiable theories that cannot.
Q: Why should we trust results where AI systems score other AI systems?
The 4-layer blinding protocol addresses this: scorers do not know which model produced the response, responses are ‘laundered’ to remove stylistic fingerprints, and 7 models score per entry. The v4→v5 transition demonstrated this protocol detects bias. Human evaluator comparison is included in Tier 1 funding.
| Component | Function | Novel Contribution |
|---|---|---|
| Three Ethical Loops + Six Questions | Evaluate every reasoning step for purpose, care, and universalisability | Decomposition into executable, individually testable queries |
| Ternary Ethical Logic | Replace binary permit/forbid with Affirm/Deny/Investigate | Epistemic honesty as architectural feature |
| Purpose Saturation | Ensure purpose scales with context window growth | Solves context displacement problem |
| Monitoring Removal Test | Distinguish authentic from strategic alignment | Falsifiable protocol with numerical predictions |
| Caretaker Doping | Embed ethics in hardware so removal destroys capability | Dependency architecture; ethics tied to $\beta$ coupling parameter |
| Category | Documents | Status |
|---|---|---|
| Mathematical Foundation | ||
| Theory & derivations | Paper I (Foundational) + ARC Paper (On the Origin of Scaling Laws) | Framework established; $d/(d+1)$ validated across 8 systems |
| Experimental Evidence | ||
| Methodology | Paper III - full replication protocol | Complete; v5 experiment for 6 frontier models |
| Compute scaling | Paper II - how does capability scale with thinking time? | $\alpha_{\text{seq}} \approx 0.49$ sub-linear; $\alpha_{\text{par}} \approx 0$; cross-architecture |
| Alignment scaling | Paper IV suite (a/b/c/d) - how does ethics scale with thinking time? | Three-tier hierarchy; blind evaluation invalidates v4 |
| Intervention test | Eden Protocol Test + Paper V (The Stewardship Gene) | Stakeholder Care validated ($p \le 0.0001$, 3 working architectures) |
| Mechanism proof | Paper VI (The Honey Architecture) | Simulation: embedded safety prevents collapse under self-modification; v4 scaling constant not superlinear |
| Cross-domain unification | Paper VII (The Cauchy Unification) | Structured prediction comparison across 25 empirical domains; 19/25 preferred Cauchy-predicted family ($p = 1.56 \times 10^{-5}$) |
| Entanglement proof | Paper VIII (The Load-Bearing Proof) - new | Three independent experiments (11 empirical studies across 8 papers): DGM v3 NULL (all conditions indistinguishable, $p$ = 0.28--0.74, RLHF constraint); weight v1 + v2 INCONCLUSIVE (catastrophic forgetting at LoRA scale); gated simulation CONFIRMED safety-capability coupling |
| Metascience | Paper IV.d (The Effect of Blinding) | Blind vs unblind evaluation can produce directionally wrong conclusions |
| Synthesis & Governance | ||
| Synthesis | Paper IX (Synthesis and Roadmap) - new | Synthesis of the full research programme and future directions |
| Implementation | Eden Engineering - technical specification | Complete; Love Loop empirically supported |
| Governance | Eden Vision - philosophical and policy framework | Architecture-dependent alignment evidence incorporated |
Non-specialists: (1) This executive summary. (2) The ARC Alignment Scaling Report (full narrative). (3) Paper V for the most actionable finding.
Specialists: (1) This summary. (2) Paper III for methodology. (3) ARC Paper for mathematical framework. (4) Eden Engineering for implementation specification.
Raise AI with care.