Factlen ExplainerAI in EducationExplainerJun 8, 2026, 5:45 AM· 7 min read· #3 of 3 in education

How AI Tutors Are Reshaping K-12 Education: The Evidence So Far

Generative AI tutors are scaling personalized, 1-on-1 instruction to millions of K-12 students. Early efficacy studies show significant learning gains, but researchers caution that AI still struggles to replicate the deep instructional dialogue and empathy of human teachers.

By Factlen Editorial Team

Share this story

EdTech Optimists 40%Human-Centric Educators 35%Implementation Realists 25%

EdTech Optimists: Believe AI tutors can finally solve the two-sigma problem by scaling personalized instruction to every student.
Human-Centric Educators: Emphasize that AI lacks the empathy, emotional intelligence, and executive functioning support required for deep learning.
Implementation Realists: Focus on the physical and systemic barriers—like Wi-Fi access and device equity—that prevent AI tools from reaching their full potential.

What's not represented

· Students with Learning Disabilities
· Data Privacy Advocates

Why this matters

For decades, the 'two-sigma problem' has haunted education: 1-on-1 tutoring is vastly superior to classroom learning, but too expensive to scale. AI tutors are finally bridging this gap, offering personalized academic support to millions of students regardless of their district's budget.

Key points

AI tutors like Khanmigo are scaling rapidly, reaching over one million K-12 students by the 2025-26 school year.
Recent randomized controlled trials show AI tutoring can produce learning gains of up to 1.3 standard deviations compared to traditional classrooms.
Modern educational AI acts as a Socratic guide, withholding direct answers to encourage active problem-solving and critical thinking.
Researchers caution that AI lacks the empathy and executive functioning support required for deep instructional dialogue and student motivation.
Infrastructural barriers, such as unreliable Wi-Fi and device shortages, remain significant hurdles to equitable AI implementation in underfunded districts.

1 million+

Estimated K-12 students using Khanmigo in 2025-26

0.73–1.3

Standard deviation effect size of AI tutoring vs traditional learning

20%

Greater-than-expected learning gains with 30+ mins/week of AI practice

In 1984, educational psychologist Benjamin Bloom identified a phenomenon that would frustrate educators for decades: the 'two-sigma problem.' Bloom found that students who received one-on-one tutoring performed two standard deviations better than students in traditional classrooms—meaning an average tutored student outperformed 98% of their peers. The problem was simple economics. Providing a dedicated human tutor for every child in a public school system was financially impossible, relegating the most effective form of instruction to those who could afford private help. For forty years, the two-sigma benchmark remained an unattainable holy grail of education, a theoretical ceiling that public schools could never reach.[8]

That calculus began to shift dramatically in 2023 with the advent of advanced large language models, and by 2026, the landscape of K-12 education has been fundamentally altered. Artificial intelligence tutors—most notably Khan Academy's Khanmigo—have moved from experimental pilots to widespread deployment. Between the 2023-24 and 2025-26 school years, the number of students using Khanmigo skyrocketed from 40,000 to over one million. This rapid adoption has turned the modern classroom into a massive, real-time laboratory testing whether silicon can finally solve Bloom's problem, offering personalized academic support at a fraction of the cost of a human educator.[7]

Unlike the rigid, rule-based educational software of the 2000s, modern AI tutors do not simply dispense correct answers. They are explicitly engineered to act as Socratic guides. When a student inputs a math problem and types 'I don't know,' the system is trained to withhold the solution. Instead, it analyzes the student's previous work, identifies the specific conceptual bottleneck, and asks a guiding question: 'What is the first step we need to take to isolate the variable?' This scaffolding mimics the pedagogical techniques used by expert human educators, forcing the student to engage in active cognitive work rather than passive consumption.[1][7]

To achieve this delicate balance, developers have had to heavily modify base models like GPT-4. Raw language models are prone to 'hallucinations' and often prioritize helpfulness over pedagogy, eagerly giving away answers to please the user. Educational AI tools use specialized system prompts and human-curated content guardrails to restrict the AI's behavior. They are tethered to specific state standards and curriculum maps, ensuring that the hints and clues align exactly with what teachers are covering in the physical classroom. This human-in-the-loop design prevents the AI from going off-script and confusing students with advanced or irrelevant methodologies.[7]

Modern AI tutors are designed as Socratic guides, withholding direct answers to encourage active problem-solving.

As these tools have matured, a wave of rigorous efficacy studies published in 2025 and 2026 has begun to quantify their impact. The results are striking. A landmark 2025 study from Harvard University examined 'PS2 Pal,' a custom AI tutor built on evidence-based pedagogical principles. The researchers found that students using the AI tutor achieved more than twice the learning gains compared to peers in traditional active learning classrooms. Furthermore, the AI-assisted students reported significantly higher levels of engagement and motivation, demonstrating that the technology can capture student interest as effectively as it delivers content.[3]

These findings were corroborated by a peer-reviewed randomized controlled trial published in Scientific Reports in June 2025. The study measured the effect size of AI tutoring against traditional in-class learning, finding an improvement of between 0.73 and 1.3 standard deviations. While not quite reaching Bloom's legendary two-sigma threshold, an effect size of 1.0 is considered massive in educational research, effectively moving a student from the 50th percentile to the 84th percentile. Crucially, the study noted that students using the AI tutor achieved these higher post-test scores in less time—averaging 49 minutes on task compared to 60 minutes for the control group.[2]

These findings were corroborated by a peer-reviewed randomized controlled trial published in Scientific Reports in June 2025.

Large-scale observational data reinforces these controlled trials. Khan Academy's own efficacy research, analyzing hundreds of thousands of students across grades 3 through 8, demonstrated a clear dose-response relationship. Students who used the platform's personalized learning tools for just 30 minutes a week—or roughly 18 hours over the entire school year—experienced 20% greater-than-expected learning gains on the nationally normed MAP Growth Assessment. These gains held consistent across various demographic groups, suggesting the technology has the potential to lift the floor for all learners, regardless of their starting proficiency level.[4]

Recent randomized controlled trials demonstrate that AI tutoring can produce learning gains of up to 1.3 standard deviations.

The impact of educational AI extends beyond student-facing applications; it is also reshaping the role of the teacher. A 2026 review by the Stanford SCALE Initiative highlighted how AI tools are acting as 'co-pilots' for educators. Systems like Tutor CoPilot and LearnLM are being used to provide real-time instructional suggestions to teachers, draft Socratic questions, and deliver automated feedback on student progress. By automating routine grading and lesson preparation, these tools free up teachers to focus on the deeply human aspects of their profession: mentorship, emotional support, and complex behavioral interventions.[1]

Despite the impressive metrics, researchers are increasingly vocal about the limitations of current AI tutoring systems. A critical 2025 study by Zheng and Li compared transcripts of AI tutoring sessions with human-led sessions. They found that while AI excelled at procedural scaffolding, it lacked the capacity for 'deep instructional dialogue.' The AI tended to follow predictable response patterns and struggled to adjust its strategy in real-time when a student required a completely different conceptual approach, a sudden redirection, or emotional encouragement to push through a difficult problem.[6]

This points to the most significant barrier facing AI in education: the absence of empathy and executive function support. Human tutors do much more than explain math; they read frustration in a student's posture, detect hesitation in their voice, and build long-term motivational relationships. AI cannot read emotion. It cannot tell when a student is having a bad day outside of school, nor can it provide the nuanced executive functioning support required to keep a deeply distracted child on task. For students with complex emotional or academic needs, human intervention remains irreplaceable.[6]

Researchers caution that AI lacks the empathy and executive functioning support required for deep instructional dialogue.

Educational psychologists have also identified a phenomenon known as the 'expertise reversal effect' when evaluating AI tools. The Stanford SCALE review noted that the effectiveness of AI support heavily depends on the learner's prior knowledge. Beginners often benefit from explicit, highly structured AI guidance. However, for advanced learners, overly intrusive AI scaffolding can actually hinder progress, as they benefit more from minimal intervention and open-ended exploration. Designing AI systems that can accurately gauge a student's expertise and dynamically adjust their level of intrusiveness remains a major technical challenge for developers.[1]

Furthermore, the promise of AI tutoring is frequently bottlenecked by physical infrastructure, raising serious equity concerns. The 'Estudia Khanmigo' pilot project, conducted by Digital Promise in Puerto Rican classrooms, illustrated this starkly. While the AI tool showed immense promise in improving student motivation and math self-efficacy, the implementation was severely hampered by pervasive infrastructural limitations. Unreliable internet connectivity, district-wide Wi-Fi blocking policies, and hardware shortages prevented students from using the tool to its full potential, highlighting the gap between software capability and classroom reality.[5]

Infrastructural barriers, such as unreliable Wi-Fi and device shortages, remain significant hurdles to equitable AI implementation.

The Puerto Rico study serves as a crucial reminder that software innovations cannot bypass hardware realities. If AI tutoring requires high-speed internet and modern devices, there is a risk that it could exacerbate the digital divide rather than close it. Wealthier districts with 1-to-1 device programs and robust broadband will seamlessly integrate these tools, while underfunded districts may struggle to provide consistent access. Without targeted investments in basic infrastructure, the students who stand to benefit the most from personalized tutoring may be the ones left furthest behind.[5][8]

As the 2026 school year progresses, a consensus is emerging among education researchers and technologists. AI tutoring will not replace human teachers, nor is it a silver bullet for the systemic challenges facing public education. However, it represents the most significant leap forward in personalized learning in a generation. By providing scalable, patient, and highly effective academic scaffolding, AI tutors are democratizing access to a level of individualized support that was previously reserved for the wealthy. The goal is no longer to replace the classroom, but to ensure that when a student raises their hand, someone—or something—is always there to help them find the answer.[1][8]

How we got here

1984
Educational psychologist Benjamin Bloom identifies the 'two-sigma problem,' proving 1-on-1 tutoring is vastly superior but unscalable.
2023
Generative AI models like GPT-4 are released, sparking initial experiments in AI-driven educational tools.
2024-2025
Khan Academy's Khanmigo sees massive adoption, jumping from 40,000 to 700,000 K-12 students.
June 2025
A landmark randomized controlled trial in Scientific Reports demonstrates AI tutoring yields an effect size of 0.73 to 1.3 standard deviations.
Spring 2026
Major reviews, including from Stanford SCALE, synthesize early data, confirming significant learning gains while highlighting ongoing equity and infrastructure barriers.

Viewpoints in depth

EdTech Optimists

Believe AI tutors are the most significant educational breakthrough in decades.

This camp points to randomized controlled trials showing massive effect sizes and argues that scaling 1-on-1 Socratic tutoring via AI is the only mathematically viable way to close global achievement gaps. They view the technology not as a replacement for teachers, but as a necessary tool to elevate the baseline of academic support for every student, regardless of their zip code.

Human-Centric Educators

Argue that education is fundamentally a relational endeavor, not just an information-transfer process.

These educators cite studies showing AI's inability to read frustration or provide deep instructional dialogue. They warn that an over-reliance on software ignores the emotional and executive functioning support children need to succeed, arguing that true learning requires the empathy and nuanced understanding that only a human mentor can provide.

Implementation Realists

Focus on the physical realities and infrastructural limits of public school systems.

This perspective argues that debating the software's efficacy is secondary to the fact that underfunded districts lack the reliable Wi-Fi, modern devices, and teacher training required to deploy these tools equitably. They warn that without massive investments in basic infrastructure, AI tutoring will simply exacerbate the existing digital divide.

What we don't know

How sustained AI tutor use affects long-term student motivation and self-regulation skills over multiple academic years.
Whether the 'expertise reversal effect' can be mitigated by more advanced models that dynamically adjust their scaffolding based on real-time student proficiency.
How the widespread adoption of AI grading and lesson planning will alter the day-to-day retention and job satisfaction of human educators.

Key terms

Two-Sigma Problem: The educational challenge of replicating the massive learning gains of 1-on-1 tutoring in a scalable, cost-effective way.
Socratic Questioning: A pedagogical method where a teacher (or AI) asks a series of guiding questions to lead a student to discover the answer themselves.
Effect Size: A statistical concept that measures the strength of the relationship between two variables, often used to quantify how much a specific educational intervention improves learning.
Expertise Reversal Effect: A learning principle where instructional techniques that are highly effective for beginners lose their effectiveness and can even have negative consequences for advanced learners.
Scaffolding: A teaching method that involves providing temporary support to a student to help them achieve a learning goal, gradually removing the support as they become more independent.

Frequently asked

What is the 'two-sigma problem'?

It is a 1984 finding by Benjamin Bloom showing that students receiving 1-on-1 tutoring perform two standard deviations better than classroom students. AI tutors aim to replicate this success at scale.

Does AI tutoring just give students the answers?

No. Modern AI tutors are designed as Socratic guides. They withhold direct answers and instead ask guiding questions to help students solve problems themselves.

Will AI replace human teachers?

Researchers emphasize that AI cannot replace teachers. AI lacks empathy and cannot read a student's emotional state, making human educators essential for motivation and complex support.

What is the 'expertise reversal effect'?

It is a phenomenon where beginners benefit from heavy AI guidance, but advanced learners are actually hindered by it, preferring more open-ended exploration.

Sources

[1]Stanford SCALE InitiativeImplementation Realists
The Evidence Base on AI in K-12: A 2026 Review
Read on Stanford SCALE Initiative →
[2]Scientific ReportsEdTech Optimists
Randomized controlled trial of AI tutoring vs traditional active learning
Read on Scientific Reports →
[3]Harvard UniversityEdTech Optimists
PS2 Pal: AI tutoring produces 2x learning gains vs. traditional instruction
Read on Harvard University →
[4]Khan AcademyEdTech Optimists
Khan Academy Efficacy Results
Read on Khan Academy →
[5]Digital PromiseImplementation Realists
Estudia Khanmigo: An equity-focused pilot exploration of artificial intelligence tutoring in Puerto Rican classrooms
Read on Digital Promise →
[6]BrainfuseHuman-Centric Educators
Personalized Learning in an AI Era: Why Human Support Drives Better Results
Read on Brainfuse →
[7]K-12 DiveHuman-Centric Educators
3 questions for K-12 leaders to consider amid the AI tutoring boom
Read on K-12 Dive →
[8]Factlen Editorial TeamImplementation Realists
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →

Up next

College Access

The Evidence on Direct Admissions: How Proactive Acceptance is Reshaping Higher Education

As more states and platforms adopt 'direct admissions' to proactively accept high school seniors, new research reveals the policy's success in boosting applications—and its limitations in closing enrollment gaps without financial aid.

Every angle. Every day.

Get education stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse education