Working Paper

AI, Alpha, and Beyond

A working paper on the role of AI in education — what Alpha Schools demonstrated, where ÆRA converges and diverges, what the current evidence supports, what it does not, and what the arrival of more capable AI systems may change.

Version 1 · Version 1 · May 2026 · Open document

Audience: Educators, researchers, school founders, policymakers, and anyone thinking seriously about AI's role in children's development
Read alongside

This is a working paper, not a position statement. It thinks in public about questions the ÆRA methodology does not yet have settled answers to. It will be revised as evidence accumulates and as the technology develops. The honest position — that we are at an early stage of understanding — is the only defensible one.

1. Why this paper exists

The introduction to the ÆRA framework acknowledges Alpha Schools and credits the structural logic they demonstrated. That acknowledgment is brief by design. The introduction is not the place for the deeper reckoning with what AI in education actually means — what the evidence shows, where it runs out, what the risks are, and what the arrival of substantially more capable AI systems may change.

This paper is that reckoning.

It is not a technology paper. It is an education paper that takes the technology seriously. The distinction matters, because most writing about AI in education falls into one of two failure modes: uncritical enthusiasm that treats AI as a solution to everything, or defensive scepticism that treats it as a threat to everything. Both are wrong. Both are intellectually lazy. The honest position is harder and more interesting: AI tools are genuinely powerful, the evidence on their educational use is genuinely thin in places, the risks are genuinely specific, and the arrival of more capable systems creates genuinely new questions that the current evidence base cannot answer.

This paper tries to be honest about all of it.

2. What Alpha Schools demonstrated — and what it actually proves

Alpha Schools is an American private school network, founded in Florida, that builds its model around two hours of daily adaptive digital instruction delivering foundational literacy and numeracy. The claim: children achieve competency in these domains in roughly a third of the time conventional schooling allocates, freeing the remaining hours for entrepreneurship, life skills, community engagement, and self-directed exploration.

The evidence for the efficiency claim is real but requires careful reading.

What the evidence shows. Adaptive learning software — when genuinely individualised, well-designed, and used with sufficient fidelity — produces faster skill acquisition in foundational domains than class-average instruction. This is not a new finding. The research on intelligent tutoring systems (VanLehn, 2011) showed that computer-based tutoring could approach the effect sizes Bloom identified for human tutoring, at roughly 0.76 standard deviations versus Bloom's two sigma. The mechanism is the same: instruction that meets a learner at exactly their current position, advances when they are ready, and reviews when they need it, is more efficient than instruction pitched at a class average. The efficiency gain is real.

What the evidence does not show. The published evidence on Alpha Schools specifically is, at the time of writing, largely internal and promotional. Independent longitudinal studies of Alpha's model — tracking cohorts across years, comparing outcomes with matched controls, measuring the quality of the freed time rather than just the efficiency of the instructed time — do not yet exist in peer-reviewed form. The efficiency claim is plausible and consistent with the broader adaptive learning literature. It is not yet independently validated at the scale and longevity that would justify high confidence. This is an honest gap, not a damning one. It is what early-stage implementation looks like. But it should be named.

The most important thing Alpha demonstrated. The structural argument — that the conventional school day is mostly overhead, and that genuinely individualised instruction could free substantial time for richer learning — is the significant claim. Whether Alpha's specific numbers hold up under rigorous scrutiny matters less than the architectural insight: the conventional school day is not efficiently structured, and the time freed by better individualisation is real. This is consistent with a much wider body of evidence. The Hattie meta-synthesis has documented for decades that formative assessment and individualised feedback are among the highest-impact interventions available. The Alpha model is, in essence, an attempt to systematise those interventions at scale using adaptive software. The insight is sound. The specific implementation is still accumulating evidence.

3. Where ÆRA converges with Alpha — and where it deliberately diverges

The convergence. ÆRA shares Alpha's structural logic entirely. The conventional school day is inefficient. Individualised instruction is orders of magnitude more efficient than class-average instruction. The time freed by that efficiency should be used for something richer than more instruction. These claims are the foundation of ÆRA's daily structure just as they are Alpha's.

The divergence — mechanism. Alpha's efficiency gain is delivered through adaptive digital tools used directly by the child. Children work on iPads with adaptive software for their daily literacy and numeracy instruction. ÆRA achieves the same compression through coach-mediated individualisation: the Aptitude Map, the Seminar model, the continuous developmental feedback loop maintained by coaches with AI backstage.

This is not a minor difference. It is the central design decision of Phase I, and it requires an honest account.

The neurodevelopmental case for the ÆRA mechanism runs as follows. Executive function — working memory, inhibitory control, attentional flexibility — develops primarily between ages three and twelve, with the six-to-ten window a particularly sensitive period for consolidation. Multiple meta-analyses show consistent negative associations between passive screen time and executive function in this window. Language development in the primary years depends on back-and-forth verbal interaction. The prefrontal cortex — the neural substrate of critical judgment — is sensitive to the quality of cognitive inputs during primary school years in ways that screen-mediated interaction does not support as well as face-to-face interaction. The Craft Judgment Protocol in Phase II — the capacity to assess AI output critically — depends on the analogue foundations built in Phase I. A child who has spent six years having their thinking partially done for them by an adaptive algorithm may be faster at foundational skills. The question is what they are not building in the process.

This is the honest version of the argument. It is a sequencing argument, not a technology-rejection argument. ÆRA is not claiming that adaptive digital tools cannot produce the efficiency gain Alpha claims. It is claiming that the specific developmental window of six to ten has properties that make the coach-mediated path a better trade-off — and that the craft judgment capacity built on analogue foundations is worth the cost of a somewhat slower mechanism.

The honest acknowledgment. ÆRA's mechanism is slower per session than direct child-AI interface. A child working with an adaptive algorithm that responds in real time to every input will, in the pure skill acquisition sense, progress faster than a child in a group where the coach mediates AI backstage and pulls Seminar groups when the Aptitude Map signals readiness. The trade-off is real and should be named rather than obscured. ÆRA is making a deliberate choice to accept somewhat lower instructional efficiency in exchange for the developmental properties of analogue, coach-mediated learning in the six-to-ten window. Whether this trade-off is the right one will ultimately be answered by longitudinal evidence. That evidence does not yet exist at the scale required to be definitive.

4. What the current evidence on AI in education actually shows

The literature on AI in education has expanded dramatically since 2020. What follows is an honest summary of where the evidence is strong, where it is thin, and where it is actively contested.

Strong evidence. Intelligent tutoring systems produce consistent and substantial learning gains in well-defined skill domains — mathematics, reading, grammar — compared with class-average instruction. The VanLehn (2011) meta-analysis remains the most comprehensive treatment; effect sizes in the range of 0.6–0.8 standard deviations are robust across multiple studies. Formative feedback delivered at high frequency — which AI systems can do at lower cost than human tutors — is one of the highest-impact interventions in Hattie's database. These findings are not in serious dispute.

Thin evidence. The evidence on AI tools for broader developmental outcomes — creativity, collaboration, democratic judgment, ecological literacy, inner attention, cultural memory, relational depth — is thin to nonexistent. This is not surprising: these outcomes are harder to measure, the tools are newer, and the longitudinal studies required to assess them have not had time to accumulate. The problem is that the tools optimised for measurable skill gains tend to be the tools adopted at scale, creating a risk that what gets measured gets developed and what does not gets crowded out. This is not a problem with AI specifically. It is a problem with the way educational research defines outcomes — and AI amplifies it.

Contested evidence. The question of whether early and sustained use of adaptive AI tools in the foundational years affects the development of executive function, intrinsic motivation, and the capacity for sustained independent attention is genuinely contested. The mechanism — that outsourcing cognitive effort to an algorithm before the relevant capacities are consolidated might impair their development — is theoretically coherent and consistent with the executive function literature. But controlled longitudinal studies in naturalistic school settings are scarce. Claims in either direction — that early AI use damages development, or that it is benign — outrun the evidence. Intellectual honesty requires holding this uncertainty rather than resolving it prematurely in either direction.

The engagement question. There is a body of evidence showing that gamified, algorithmically adaptive learning systems produce short-term engagement gains. There is a much thinner body of evidence on whether this engagement persists, whether it transfers to self-directed learning outside the system, and whether it displaces the development of intrinsic motivation that is the most reliable predictor of long-term learning. Deci and Ryan's self-determination theory would predict that extrinsically regulated engagement — even gamified engagement — is less robust than intrinsically motivated engagement over time. Testing this prediction in the specific context of AI-adaptive educational tools is important work that has not yet been done at scale.

5. What to be genuinely careful about

This section names specific risks that the methodology takes seriously and that any honest account of AI in education must address.

The capability substitution risk. The most significant risk of AI tools in education is not that they will teach children the wrong things. It is that they will do children's thinking for them at the developmental moment when doing their own thinking is what builds the capacity to think. This risk is highest in the foundational years — the six-to-ten window where executive function is consolidating and where the habit of sustained, self-directed cognitive effort is either built or not. A child who has spent these years having an adaptive algorithm identify the next task, calibrate the difficulty, and provide immediate feedback has been protected from the productive difficulty that consolidates these capacities. The algorithmic system is not malicious. It is optimised for a different thing. The risk is structural, not intentional.

The measurement displacement risk. AI systems are powerful at measuring what can be measured and optimising for what can be specified. The domains most important to the ÆRA methodology — craft judgment, democratic participation, ecological literacy, inner attention, cultural memory, relational depth — are among the hardest to measure and the least likely to be captured by the metrics an adaptive system is optimised for. Institutions under pressure to show results will reach for AI tools that produce measurable gains. The risk is that measurable gains crowd out the development of capacities that matter more but are harder to measure. This is not a new risk — standardised testing has produced the same dynamic for decades. AI amplifies it.

The dependency risk. There is a specific form of learned helplessness that AI tools may produce that is different in kind from previous forms. A student who has relied on an AI to identify what to work on next, scaffold the difficulty, and provide immediate feedback may not have developed the capacity to identify their own learning edge, sit with productive difficulty, or assess their own progress. These are not trivial capacities. They are exactly what self-directed learning in Phase II depends on. Dependency on AI scaffolding in the foundational years may not be visible until the scaffolding is removed — which, in the ÆRA model, is the moment of the Atelier Passage. This risk is real and is one of the honest reasons for ÆRA's delayed interface model.

The data risk. Adaptive learning systems require data. The more granular the adaptation, the more granular the data required. A child's learning data — their pace, their error patterns, their emotional responses to difficulty, their engagement over time — is among the most sensitive personal data that exists. The risks of this data being held by commercial entities under foreign legal jurisdictions, used for purposes other than the child's development, or retained long enough to be used against the child in later life are not hypothetical. The ÆRA Data Sovereignty Truth Document addresses this architecturally. Any school adopting AI tools in the foundational years needs a data governance position that is as rigorous as its pedagogical position. These are not separate questions.

The equity risk. AI-enhanced education has a consistent pattern across early adopters: it benefits children who already have strong foundational skills more than children who have weaker ones, because the tools assume a baseline of executive function, language fluency, and self-regulatory capacity that not all children bring. This is well-documented in the intelligent tutoring systems literature. Adaptive tools that are genuinely beneficial across the ability distribution are harder to build and less commercially attractive than tools that show impressive gains in the middle of the distribution. Schools adopting AI tools without awareness of this risk may inadvertently use them in ways that widen rather than close gaps.

6. The AGI question

Artificial General Intelligence — AI that performs across cognitive domains at or above human level — does not yet exist. As of May 2026, the most capable AI systems are highly competent within specific domains and genuinely impressive across a wider range of domains than their predecessors. They are not AGI in the technical sense that researchers use the term. Whether AGI will arrive, when, and in what form are genuinely open questions on which experts disagree in good faith.

The methodology's honest position on AGI is: we do not know.

This is not a comfortable position for an educational framework making twelve-year bets on what children will need. But it is the correct position. What can be said with more confidence is the following.

What changes if AGI arrives. If a system exists that can perform any cognitive task a human can perform, at or above human level, across all domains, then the logic of most educational frameworks — including the ÆRA framework — requires fundamental revision. The argument for developing children's capacity to perform specific cognitive tasks loses force if those tasks can be performed by a non-human system at negligible cost. The argument for developing craft judgment as the capacity to assess AI output loses force if there is no human who can assess AGI output better than another AI system.

This is not a comfortable thing to write in a document produced by an educational institution. It is honest.

What does not change if AGI arrives. Several capacities the ÆRA methodology builds appear robust across a wide range of scenarios, including scenarios where AI systems are substantially more capable than they are now.

Democratic governance. The capacity to participate in collective decision-making, to reason in public, to build coalitions, to live with decisions that went against you — these are social and relational capacities that do not become less important if AI systems become more capable. If anything, the governance of AI systems will require exactly these capacities in greater measure.

Ecological relationship. A person who has spent years in a specific place, understanding its ecology across seasons, contributing to its monitoring and care, developing the felt relationship that makes stewardship genuine rather than performed — this does not become less important if AI systems become more capable of ecological modelling. The relationship is not a cognitive task. It is a form of human engagement with the world that exists whether or not AI can model the same landscape more accurately.

Inner attention. The capacity to observe one's own thoughts, emotions, and impulses with clarity and without judgment does not become less important if AI systems become more capable. It may become more important. A world in which AI systems continuously supply apparent answers, apparent emotions, apparent relationships, and apparent meaning places an unprecedented premium on the capacity to distinguish one's own inner experience from its AI-mediated simulations.

Cultural memory and place. The capacity to inhabit a specific place fully — ecologically and culturally, knowing its human stories and its ecological rhythms — is a form of knowledge that does not transfer to AI systems regardless of their capability. The griot's knowledge is not information. It is embodied, relational, place-specific, and accumulated across a life of genuine engagement. The child who grows up in deep relationship with a specific place and its stories has something that no AI system, however capable, can replicate or remove.

Relational depth. The capacity for genuine human relationship — the kind built through years of proximity, shared governance, shared making, shared difficulty — is not a cognitive task. It becomes more valuable, not less, as AI systems become more capable of simulating relationship. The child who has learned to govern alongside others, make things alongside others, and care for a place alongside others has a form of relational depth that is not threatened by AI capability.

The bet the methodology is making. Given genuine uncertainty about AGI, the methodology's response is to build the capacities most resilient across the widest range of possible futures. Inner attention. Craft judgment. Relational depth. Place-specific knowledge. Democratic governance. Cultural memory. These are the capacities the seven convergent principles are designed to develop. They are not only useful in an AI world — they may be the only things that retain irreplaceable human value in an AGI world.

This is not a claim that the methodology has solved the AGI problem. It is a claim that the methodology has made a specific bet about what to build, given uncertainty, and that the bet is coherent.

7. What the methodology does not yet know

Epistemic honesty requires naming the specific gaps in the methodology's own knowledge base.

No longitudinal evidence for Phase I at scale. ÆRA Sintra opens in September 2027. The first longitudinal evidence on Phase I outcomes will not be available for at least four years after that, and will not be sufficient for high-confidence conclusions for considerably longer. The methodology's current confidence in its Phase I design rests on the convergent evidence from the traditions it draws from and the shorter-term evidence from comparable implementations. This is a reasonable basis for proceeding. It is not the same as validated longitudinal evidence of the specific ÆRA model.

No evidence on the coach-mediated vs. direct-interface trade-off. The claim that coach-mediated AI backstage produces better developmental outcomes than direct child-AI interface in the six-to-ten window is theoretically grounded and consistent with the neurodevelopmental evidence. A controlled study comparing the two approaches — same population, same duration, same outcome measures — does not exist. Building this evidence is a priority for the ÆRA research programme.

No evidence on Phase II at the model's intended depth. The Craft Judgment Protocol, the Founding Project, the Journeyman placement model, the cooperative membership at eighteen — none of these have been implemented long enough to produce outcome data. The evidence from adjacent models (Swiss Berufslehre, Big Picture Learning, Sudbury alumni) is encouraging and structurally relevant. It is not evidence on ÆRA's specific Phase II design.

No settled answer to the Screen Passage threshold. The minimum age of thirteen for direct AI interaction is grounded in UNESCO guidance and the neurodevelopmental literature on prefrontal cortex development. The specific age is a reasonable line drawn from available evidence, not a precise threshold derived from controlled studies of AI interaction specifically. As the research on children's cognitive development in the context of AI interaction accumulates, this threshold may need revision.

No settled answer to the AGI timeline. As stated above. The methodology proceeds on reasonable projections. The arrival of substantially more capable AI systems may require fundamental revision of assumptions that currently feel stable.

8. How the methodology responds to uncertainty

The methodology's response to these uncertainties is not to pretend they do not exist. It is to make them explicit, build the research infrastructure to address them, and design the framework so it can be revised as evidence accumulates.

Three structural features reflect this.

The Concerns Register. A published, updated document that names every known limitation, unresolved tension, and gap between current capability and stated ambition. The Concerns Register is not an admission of failure. It is the condition of intellectual honesty. A methodology without a Concerns Register either has no limitations or is not looking for them. Both are worse than the alternative.

The research programme. ÆRA is designed as a longitudinal educational research institution, not only a practitioner network. The data generated across the network is studied by affiliated researchers under academic ethical oversight. The explicit aim is to build the evidence base that the methodology currently lacks — not as a validation exercise, but as a genuine inquiry into what works and what does not. The methodology is expected to change as the evidence accumulates. This is a feature, not a bug.

The revision commitment. This paper is version one. It will be wrong in specific ways that are not yet apparent. When those ways become apparent — through evidence, through the development of the technology, through the honest critique of people who read it carefully — it will be revised. The version number is not a formality. It is an acknowledgment that the argument is ongoing.

9. The honest summary

What Alpha Schools demonstrated: that the conventional school day is structurally wasteful, and that genuine individualisation produces an efficiency gain large enough to free substantial time for richer learning. This is the structural insight ÆRA shares.

Where ÆRA diverges: in the mechanism for achieving that efficiency in the six-to-ten window, and in the specific trade-off it makes between instructional efficiency and developmental integrity. The reasons for this divergence are documented and grounded in available evidence. The evidence is not yet sufficient to be definitive.

What the current evidence on AI in education shows: strong evidence for efficiency gains in foundational skill domains; thin evidence for broader developmental outcomes; genuinely contested evidence on the developmental risks of early AI interface; real and specific risks around capability substitution, measurement displacement, dependency, data, and equity.

What AGI changes: potentially, a great deal — possibly the fundamental logic of much educational planning. The honest position is that we do not know. The methodology's response is to build the capacities most resilient across the widest range of possible futures, and to hold the uncertainty explicitly rather than resolving it prematurely.

What the methodology does not yet know: the longitudinal evidence on its own specific design. This is the most important gap. It will take years to close. The Concerns Register names it. The research programme is designed to address it.

The methodology proceeds with appropriate confidence in the evidence it has, and appropriate humility about the evidence it lacks. This is the only intellectually honest position available.

ÆRA — AI, Alpha, and Beyond · Working Paper · Version 1 · May 2026 Open document. Freely available. Revisions expected as evidence accumulates. Annotations and critique from researchers and practitioners are welcome.

Comment on this paper