How Indigenous Communities Are Building Sovereign AI to Save Endangered Languages
Faced with the rapid extinction of global languages, Indigenous technologists are developing custom AI tools to preserve their linguistic heritage on their own terms.
By Factlen Editorial Team
- Indigenous Technologists
- Advocate for using AI to revitalize languages while strictly maintaining data sovereignty and community ownership.
- Academic Researchers
- Focus on overcoming the technical limitations of sparse data and polysynthetic grammar in machine learning.
- Commercial Tech Partners
- Provide the underlying generative AI and cloud infrastructure to scale community-led language platforms.
What's not represented
- · Elders hesitant about digitizing sacred knowledge
- · Big tech companies building general-purpose translation models
Why this matters
As the world loses an Indigenous language every two weeks, AI is providing a critical lifeline to preserve cultural heritage. By building their own sovereign AI tools, Indigenous communities are not only saving their languages but also reshaping how artificial intelligence is developed and governed.
Key points
- Over 40% of the world's 7,000 languages are currently endangered, with one disappearing every two weeks.
- Traditional AI models require massive datasets, making them ineffective for minority languages with sparse digital footprints.
- Indigenous technologists are building custom AI tools, such as Te Hiku Media's 92%-accurate speech recognition model for te reo Māori.
- Communities are prioritizing "data sovereignty" to ensure their cultural knowledge remains protected and locally owned.
The paradox of the digital age is that the very technology often blamed for homogenizing global culture is now being weaponized to save its most vulnerable voices. For years, artificial intelligence has been criticized for its linguistic bias, trained overwhelmingly on dominant languages like English, Mandarin, and Spanish.[7]
But a quiet revolution is underway. Indigenous technologists, linguists, and community elders are flipping the script, building bespoke AI models to rescue endangered languages from the brink of extinction.[3]
The stakes are existential. Of the roughly 7,000 languages spoken worldwide today, UNESCO estimates that more than 40 percent are endangered. At the current rate, one Indigenous language disappears every two weeks, taking with it centuries of ecological knowledge, cultural memory, and unique ways of understanding the world.[7]

Historically, the barrier to applying machine learning to minority languages has been a problem of scale. Traditional automatic speech recognition (ASR) systems require massive datasets to function effectively. An efficient AI model typically needs up to 50,000 hours of transcribed audio to accurately recognize and process a language.[3]
For most endangered languages, that volume of data simply does not exist. Furthermore, many Indigenous tongues, such as Cheyenne or Blackfeet, are polysynthetic. This means that prefixes, roots, and suffixes blend into long, complex words that standard natural language processing algorithms struggle to parse.[3]
Despite these hurdles, breakthroughs are happening, led by the communities themselves. In Aotearoa New Zealand, Te Hiku Media, an iwi-led broadcaster, has successfully built an ASR model for te reo Māori that operates at a 92 percent accuracy rate.[1]

Built using open-source toolkits, the Te Hiku model actually outperforms transcription attempts by multinational big tech companies. The secret to their success was not just algorithmic, but social: they spent years carefully recording elders and second-language learners in a culturally grounded way, ensuring the nuances of the language were captured authentically.[1]
Built using open-source toolkits, the Te Hiku model actually outperforms transcription attempts by multinational big tech companies.
This success has sparked a wave of innovation. In late 2025, the University of Auckland secured a $1 million grant to develop an AI-powered coaching tool specifically designed to help learners master the pronunciation of te reo Māori. The tool provides real-time, personalized feedback, helping users develop the muscle memory required to speak confidently.[2]
Similar efforts are taking root globally. In Quebec, the First Languages AI Reality (FLAIR) initiative is working to drastically reduce the data requirements needed to train AI for endangered languages. Led by Indigenous computer scientists, the project aims to create custom models that can power language learning, audio transcription, and voice-controlled technology for communities with very few fluent speakers.[4]
Beyond transcription, generative AI is being used to make static archives interactive. A platform called CultureQ, developed by Kiwa Digital in partnership with cloud providers, allows users to have dynamic conversations with digitized cultural records. Instead of merely reading a transcript, a younger generation can ask questions and hear the language spoken aloud, simulating the experience of sitting with an elder.[6]

Hardware is also adapting to the mission. Educational tools like the Skobot—a motion-activated toy that uses real children's voices to answer questions in Indigenous languages—are bringing language revitalization into the playroom. By making the language accessible and fun, these tools target the most critical demographic for language survival: the youth.[5]
Yet, the integration of AI into cultural preservation is not without friction. The central tension revolves around "data sovereignty"—the right of Indigenous communities to own and control their linguistic data. General-purpose AI models have a history of scraping the internet for training data without permission, leading to fears that sacred stories or cultural knowledge could be commodified or misrepresented.[3]
To counter this, organizations like Te Hiku Media and FLAIR explicitly design their systems to protect data sovereignty. They ensure that the data remains on community-owned servers and that the AI models serve the people who provided the knowledge, rather than being absorbed into a corporate black box.[1][4]

Technologists are quick to temper expectations about what AI can actually achieve. As Michael Running Wolf, a former big tech engineer and co-founder of the FLAIR initiative, notes, AI is ultimately just a tool. "It's just going to be like a pencil. It's useful but it's not going to save our language," he explained.[3][5]
The ultimate goal of all these technological interventions is profoundly human: creating more speakers. By bridging the gap between traditional knowledge and modern digital habits, AI is giving endangered languages a fighting chance to not just survive in archives, but to thrive in the conversations of the next generation.[8]
How we got here
2024
Te Hiku Media successfully deploys a te reo Māori speech recognition model with 92% accuracy.
Mid 2025
The Skobot, an AI-powered toy teaching Indigenous languages, gains attention as an educational tool.
Aug 2025
The University of Auckland receives a $1 million grant to build an AI pronunciation coach for te reo Māori.
Dec 2025
Custom D and Kiwa Digital launch CultureQ, using generative AI to make Indigenous archives conversational.
Viewpoints in depth
Indigenous Technologists' View
AI must be built by and for the community, prioritizing data sovereignty.
This camp argues that big tech's approach to AI—scraping vast amounts of data without permission—is a form of digital colonialism. They insist that language revitalization tools must be built on secure, community-owned infrastructure. For them, the process of building the AI is as important as the product, ensuring that elders are respected and that the resulting technology serves the community's specific cultural needs rather than a corporate bottom line.
Linguistic Researchers' View
AI is a powerful tool to overcome the bottleneck of manual transcription and sparse data.
Academic linguists focus on the technical breakthroughs required to make AI work for minority languages. They highlight the challenge of polysynthetic languages and the lack of massive audio datasets. By developing new machine learning techniques that require drastically less data, they believe AI can accelerate the documentation of languages that might otherwise disappear before human linguists could fully record them.
Community Educators' View
Technology is only useful if it creates new, fluent speakers.
Educators and community leaders view AI pragmatically. They appreciate tools like pronunciation coaches and interactive archives, but they stress that technology cannot replace human connection. Their primary metric for success is not algorithmic accuracy, but whether these digital tools successfully encourage youth to speak the language in their daily lives and homes.
What we don't know
- Whether AI tools can successfully bridge the gap to create fully fluent, conversational speakers in the long term.
- How smaller communities with virtually no funding or technical infrastructure will be able to access these bespoke AI solutions.
Key terms
- Automatic Speech Recognition (ASR)
- Technology that converts spoken language into written text, a foundational tool for digital language processing.
- Polysynthetic language
- A type of language where words are composed of many tightly linked parts (prefixes, roots, suffixes), making them highly complex for standard AI to analyze.
- Data sovereignty
- The right of a community or nation to govern the collection, ownership, and application of its own data.
- Generative AI
- Artificial intelligence capable of generating text, audio, or other media in response to prompts, used here to create interactive cultural archives.
Frequently asked
Can AI automatically translate endangered languages?
It is difficult because standard AI requires massive amounts of data, which endangered languages lack. However, custom models are now being built to work specifically with sparse data.
What is data sovereignty in this context?
It is the principle that Indigenous communities retain ownership and control over their linguistic and cultural data, preventing it from being exploited by outside tech companies.
Will AI replace human language teachers?
No. Technologists emphasize that AI is merely a tool to assist learning and transcription; human interaction and intergenerational transmission remain essential for a language to survive.
Sources
[1]NVIDIAIndigenous Technologists
Māori Speech AI Model Helps Preserve and Promote New Zealand Indigenous Language
Read on NVIDIA →[2]University of AucklandAcademic Researchers
AI tool to transform te reo Māori learning
Read on University of Auckland →[3]CBC NewsIndigenous Technologists
Indigenous language experts say AI a useful tool, but data ownership key
Read on CBC News →[4]MilaIndigenous Technologists
First Languages AI Reality (FLAIR)
Read on Mila →[5]Smithsonian MagazineAcademic Researchers
How Indigenous Researchers Are Using A.I. to Help Save Endangered Languages
Read on Smithsonian Magazine →[6]Amazon Web ServicesCommercial Tech Partners
A GenAI Approach to Revitalizing Indigenous Language for the Digital Age
Read on Amazon Web Services →[7]BBCCommercial Tech Partners
Could AI stop endangered languages from going extinct?
Read on BBC →[8]Factlen Editorial Team
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get culture stories with full source coverage and perspective breakdowns delivered to your inbox.









