The Evidence on AI Tutors: How Universities Are Closing the STEM Achievement Gap
Recent randomized controlled trials reveal that purpose-built AI teaching assistants are significantly improving student test scores and engagement in higher education, provided students are given unrestricted access.
By Factlen Editorial Team
- AI Adoption Advocates
- Believe AI tutors are the key to scaling personalized education and solving historical achievement gaps.
- Pedagogical Skeptics
- Argue that AI's impact on actual learning outcomes is often overstated compared to traditional methods.
- Behavioral Economists
- Focus on how the design and restriction of AI access dictate a student's effort and self-regulation.
What's not represented
- · First-Generation College Students
- · University Financial Officers
Why this matters
Dropout rates in university STEM programs have historically hovered near 50 percent, largely due to a lack of personalized academic support. The proven efficacy of AI tutors means millions of students can now access 24/7, individualized guidance, potentially transforming graduation rates and the future technical workforce.
Key points
- AI tutors are designed to guide students through problem-solving rather than providing direct answers.
- A Harvard trial showed AI-tutored physics students achieved nearly double the learning gains of traditional classrooms.
- Research indicates that unrestricted access to AI tutors yields better test scores than restricted access.
- While AI improves engagement, some studies show its impact on test scores is comparable to traditional search methods.
- Human educators remain essential for deep instructional dialogue and complex emotional scaffolding.
The holy grail of education has long been Bloom’s "two sigma" problem: the 1984 finding that average students tutored one-to-one perform two standard deviations better than students in traditional classrooms. For decades, scaling this level of personalized instruction was economically impossible for universities. Now, higher education is deploying generative artificial intelligence to bridge the gap, and the empirical evidence suggests it is working.[7]
Across university campuses in 2025 and 2026, AI teaching assistants have moved from experimental novelties to integrated pedagogical tools. These systems, ranging from custom-built university models to commercial platforms like Khan Academy's Khanmigo, are designed not to give students the answers, but to guide them through the productive struggle of learning.[2][3]
The empirical evidence regarding their efficacy is now emerging from rigorous academic studies, painting a complex but highly promising picture of how AI alters the learning curve in demanding science, technology, engineering, and mathematics (STEM) courses.[7]
The most striking data comes from controlled environments where AI is purpose-built for specific curricula. In a randomized controlled trial conducted in a notoriously difficult undergraduate physics course at Harvard University, researchers tested a custom AI tutor against traditional active-learning classroom instruction.[1]

The results were dramatic. Students who learned using the AI tutor achieved nearly twice the learning gains on pre- and post-tests compared to their peers in the traditional classroom. Furthermore, the AI-tutored cohort mastered the material in less time and reported significantly higher levels of engagement and motivation.[1]
This success is largely attributed to the instructional design of the AI. Rather than functioning as an answer engine, the Harvard physics tutor utilized expert-designed instructional scaffolds. It guided students through step-by-step reasoning and employed built-in safeguards to ensure explanations supported deep conceptual understanding rather than algorithmic shortcuts.[1][7]
Harvard has also seen massive adoption of its "CS50 Duck," an AI-powered virtual rubber duck deployed in its introductory computer science course. By approximating a 1:1 teacher-to-student ratio, the bot provides 24/7 interactive assistance, code explanations, and style suggestions. Surveys of the student body revealed that 94 percent found the tool helpful and effective, frequently describing the experience as having a "personal tutor" available at any hour.[5]
Beyond elite institutions, the impact of AI tutors is being measured across diverse university populations. At the WZB Berlin Social Science Center, researchers conducted a randomized experiment involving 334 university students preparing for an incentivized exam to understand how different modes of AI access affect learning outcomes.[4]
Beyond elite institutions, the impact of AI tutors is being measured across diverse university populations.
The WZB study divided students into three groups: a control group with only textbook material, a restricted-access group that had to complete initial independent reading before unlocking the AI tutor, and an unrestricted-access group that could use the AI throughout the entire study period.[4]
The findings upended traditional pedagogical assumptions about technology reliance. Access to the AI tutor raised overall test performance by 0.23 standard deviations relative to the control group. Surprisingly, the cohort with unrestricted access significantly outperformed the restricted-access group by 0.21 standard deviations.[4]

Behavioral analysis from the Berlin researchers revealed why this occurred. Unrestricted access fostered a gradual, seamless integration of AI support into the student's workflow. In contrast, restricting the AI induced intensive, disruptive bursts of prompting once the tool was unlocked, breaking the student's learning flow. The data suggests that continuous availability better aligns with self-regulated learning than artificially structured delays.[4]
However, the evidence is not uniformly triumphant across all contexts. A mixed-methods study published in the Journal of Teaching and Learning investigated the effectiveness of Khanmigo in an undergraduate physics module on lunar phases. The study compared students using the AI tutor against a cohort using a standard Google search engine.[6]
While quantitative analysis revealed significant learning gains across all conditions, the researchers found no statistically significant differences in the final learning outcomes between the Khanmigo group and the Google search group. The immediate impact on test scores was comparable to traditional digital research methods.[6]
Despite the statistical tie in outcomes, the qualitative findings from the Khanmigo study were highly favorable. Students appreciated the AI's step-by-step guidance and personalized interactions, viewing it as a powerful supplementary tool. This suggests that while AI might not instantly revolutionize test scores in every micro-context, it significantly improves the subjective student experience and reduces the friction of independent study.[6]
At Michigan State University, researchers at the Evidence Learning Innovation Research Center are currently tracking Khanmigo's deployment across various campus programs, including the Dow STEM Scholars and the College Assistance Migrant Program. Their ongoing work aims to isolate exactly which factors make AI most effective for different demographic groups, particularly first-generation college students who may lack traditional academic support networks.[3]

As universities scale these tools, a consensus is forming around their limitations. AI-generated tutoring still lacks the deep instructional dialogue characteristic of expert human educators. Comparative studies indicate that while AI excels at routine problem-solving and immediate feedback, it often follows predictable response patterns and struggles to adjust in real-time when a student requires complex emotional scaffolding or nuanced redirection.[7]
Human tutors remain essential for fostering higher-order critical thinking, drawing out multi-step explanations, and challenging deeply held misconceptions. The consensus among educational researchers is that AI will not replace the professor or the teaching assistant, but rather reallocate their time.[7]
By offloading routine administrative queries, late-night debugging, and basic conceptual explanations to AI tutors, human educators are freed to focus on curriculum development, complex mentoring, and interactive classroom simulations. The university course of the future is shifting from a broadcast model of generic lectures to a personalized, mastery-based environment where evaluation and feedback happen continuously.[1][2]
How we got here
1984
Educational psychologist Benjamin Bloom publishes the 'two sigma' problem, highlighting the massive benefits of 1:1 tutoring.
Summer 2023
Harvard University pilots the CS50 Duck AI tutor with 70 students to approximate a 1:1 teacher ratio.
2024-2025
Michigan State University rolls out a Khanmigo pilot across various STEM and first-generation student programs.
Early 2025
The WZB Berlin Social Science Center publishes an RCT showing unrestricted AI access significantly boosts test scores.
Viewpoints in depth
AI Adoption Advocates
Believe AI tutors are the key to scaling personalized education.
This camp, heavily represented by computer science departments and educational technologists, argues that AI is finally solving Bloom's 'two sigma' problem. By providing 24/7, personalized, Socratic guidance, AI tutors can dramatically increase student engagement and mastery, particularly in notoriously difficult STEM courses where dropout rates are high. They point to RCTs showing doubled learning gains as proof that the era of the generic lecture is ending.
Pedagogical Skeptics
Argue that AI's impact on actual learning outcomes is often overstated.
Educational researchers in this camp caution against viewing AI as a silver bullet. They highlight studies where AI tutors produced no statistically significant difference in test scores compared to traditional search engines or printed materials. Furthermore, they emphasize that AI currently lacks the capacity for deep instructional dialogue, emotional scaffolding, and the ability to dynamically adjust to a student's nuanced misconceptions—skills that remain the exclusive domain of human educators.
Behavioral Economists
Focus on the mechanics of how students interact with AI tools.
This perspective is less concerned with the technology itself and more focused on student behavior and self-regulation. Their research demonstrates that the design of AI access dictates its success. Counterintuitively, they have found that placing restrictions on AI to force independent reading actually disrupts the learning flow, whereas unrestricted, seamless access allows students to gradually integrate the tool into their study habits, leading to higher test scores.
What we don't know
- Whether the learning gains observed in STEM courses translate equally well to humanities and social science curricula.
- How long-term reliance on AI tutors throughout a four-year degree affects a student's independent research skills.
Key terms
- Generative AI Tutor
- An artificial intelligence system designed to guide students through problem-solving using Socratic questioning rather than providing direct answers.
- Productive Struggle
- The educational concept where students expend effort to grapple with a difficult concept, which builds deeper understanding and long-term retention.
- Standard Deviation (SD)
- A statistical measure of variance used in education research to quantify the effect size and impact of a specific teaching intervention.
- Randomized Controlled Trial (RCT)
- A scientific study design that randomly assigns participants to an experimental group or a control group to measure the causal impact of an intervention.
Frequently asked
Does using an AI tutor make students lazy?
No. Research indicates that unrestricted access to AI tutors actually improves test performance and does not crowd out reading effort when the AI is designed to use Socratic questioning.
Are AI tutors replacing human professors?
No. AI tutors handle routine questions and step-by-step problem-solving, which frees professors to focus on complex mentoring, emotional scaffolding, and curriculum design.
Is Khanmigo better than ChatGPT for students?
Yes, educational AI tools like Khanmigo are specifically programmed to act as tutors, asking guiding questions to foster understanding rather than simply generating the final answer like a standard chatbot.
Sources
[1]University AffairsAI Adoption Advocates
AI changes the economics of higher education
Read on University Affairs →[2]EdTech MagazineAI Adoption Advocates
AI Teaching Assistants Improve Student Performance
Read on EdTech Magazine →[3]The State NewsPedagogical Skeptics
MSU researchers study Khanmigo AI tutor efficacy
Read on The State News →[4]WZB Berlin Social Science CenterBehavioral Economists
AI Tutoring Enhances Student Learning Without Crowding Out Reading Effort
Read on WZB Berlin Social Science Center →[5]Harvard UniversityAI Adoption Advocates
CS50's AI-powered chatbot and continuous improvement
Read on Harvard University →[6]Journal of Teaching and LearningPedagogical Skeptics
Leveraging Khanmigo Generative AI-Powered Tool for Personalized Tutoring to Learn Scientific Concepts
Read on Journal of Teaching and Learning →[7]Factlen Editorial Team
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
More in education
See all 22 stories →Cognitive Science
The Science of Effective Learning: What Academic Research Actually Proves Works
8 sources
Literacy Reform
How the 'Science of Reading' is Transforming American Classrooms
8 sources
AI in Education
The Evidence is In: AI Tutors Significantly Improve University Student Outcomes
8 sources
Green Collar Jobs
The Green Collar Boom: How Clean Energy is Rewriting the Rules of Vocational Education
6 sources
Every angle. Every day.
Get education stories with full source coverage and perspective breakdowns delivered to your inbox.












