Evidence Pack: How AI is Decoding the 'Phonetic Alphabet' of Sperm Whales and Other Species
Advances in machine learning have revealed complex, combinatorial structures in animal vocalizations, moving science closer to understanding interspecies communication.
By Factlen Editorial Team
- Bioacoustics Researchers
- Focused on using AI to uncover the structural complexity of animal vocalizations.
- Conservation Ethicists
- Concerned with the moral implications and potential exploitation of interspecies communication.
- Tech Industry Innovators
- Applying foundation models to scale up interspecies translation efforts.
What's not represented
- · Indigenous communities with traditional ecological knowledge of animal behavior
- · Marine policymakers tasked with drafting regulations for AI-generated animal calls
Why this matters
Decoding animal communication could fundamentally shift humanity's relationship with nature, transforming conservation efforts from human-centric management to ecocentric stewardship while raising new ethical questions about animal welfare.
Key points
- Machine learning has revealed that sperm whales use a complex, combinatorial 'phonetic alphabet' built from rhythm, tempo, rubato, and ornamentation.
- While AI has mapped the structural grammar of these vocalizations, the semantic meaning of the calls remains unknown.
- New foundation models, such as NatureLM-audio and DolphinGemma, are scaling up bioacoustics research across thousands of species.
- A global survey indicates strong public support for interspecies research, but 85% demand strict regulations to prevent commercial exploitation.
For decades, the barrier between human language and animal communication seemed impenetrable. While humans evolved a vast combinatorial system of vowels and consonants to express infinite ideas, animal vocalizations were largely viewed as simple, hardcoded responses to environmental stimuli. Today, that wall is beginning to crack. Driven by the same neural network architectures that power modern large language models, artificial intelligence is doing for bioacoustics what it recently did for human text: finding the hidden, complex grammar in mountains of unstructured data.[7]
The vanguard of this interspecies translation effort centers on the sperm whale. These highly social, deep-diving mammals possess the largest brains on Earth and live in complex matriarchal societies. They communicate in the pitch-black ocean using rhythmic sequences of clicks known as codas. For years, marine biologists could only categorize a few dozen basic coda types. But by applying advanced machine learning to massive acoustic datasets, researchers have uncovered a communication system far more intricate than previously imagined.[2][3]
The central claim emerging from this new wave of bioacoustics research is that sperm whales utilize a combinatorial phonetic alphabet. The prevailing scientific consensus has shifted from viewing whale codas as rigid, holistic signals to recognizing them as modular constructs. Researchers now argue that sperm whales build a vast repertoire of distinct vocalizations by combining a finite set of acoustic building blocks, much like humans combine vowels and consonants.[1][2]
The foundational evidence for this phonetic alphabet stems from a landmark study published in Nature Communications by researchers from Project CETI and MIT's Computer Science and Artificial Intelligence Laboratory. The team analyzed a massive dataset of 8,719 codas recorded over a decade from the sperm whale families of the Eastern Caribbean clan. By feeding this audio into machine learning algorithms designed to detect latent patterns, the researchers successfully mapped the full structural variance of the clan's communications.[1][2]
The mechanism behind this communication relies on four distinct acoustic features. The AI revealed that the whales' coda repertoire is built from two context-independent features, rhythm and tempo, alongside two context-sensitive features, rubato and ornamentation. Rubato involves smoothly varying the duration of the calls, while ornamentation refers to adding extra clicks to a standard sequence. Sperm whales freely combine these four features to generate an enormous inventory of distinguishable codas.[1][2]

Further evidence of this mechanism's complexity is found in the whales' conversational dynamics. The algorithms detected remarkable precision in how these features are deployed during social exchanges. Sperm whales make sub-second adjustments to their rubato and ornamentation to match one another, effectively taking turns and modulating their calls based on the immediate social context. This level of systematic, combinatorial vocalization was previously thought to be uniquely human.[2][3]
Despite these structural breakthroughs, transparent uncertainty remains regarding the actual meaning of the codas. While the evidence for structural complexity is robust, the semantic meaning of this alphabet remains entirely opaque. AI has successfully mapped the syntax of sperm whale communication, but it has not provided a dictionary. Researchers know exactly how the whales construct their codas, but they do not yet know what they are saying—whether they are discussing foraging strategies, negotiating social hierarchies, or simply announcing their presence.[1][3]
Despite these structural breakthroughs, transparent uncertainty remains regarding the actual meaning of the codas.
A second major claim in the field is that AI foundation models can now generalize to decode multiple non-human species simultaneously. The breakthrough with sperm whales is not an isolated incident. The bioacoustics field is rapidly moving toward universal foundation models—AI systems trained on vast amounts of multi-species audio that can classify and generate animal vocalizations across the entire animal kingdom without needing a bespoke model for every new animal.[4][6]
The primary evidence for this cross-species generalization came in 2025, when the nonprofit Earth Species Project released NatureLM-audio, the first large-scale audio language model tailored specifically for animals. In rigorous testing, the model demonstrated the ability to accurately analyze calls, count individuals, and distinguish traits such as sex and life stage in species ranging from zebra finches to marine mammals. The model proved that AI can scale its pattern-recognition capabilities across dramatically different evolutionary lineages.[4][6]

Additional evidence comes from the tech industry's entry into bioacoustics. Google DeepMind recently detailed its DolphinGemma project, a large language model developed in collaboration with Georgia Tech and the Wild Dolphin Project. DolphinGemma ingests raw dolphin audio, separates the sounds, and tokenizes them into a format that a large language model can process. The ultimate goal of the project is to generate synthetic sounds to communicate back to the pod in real-time.[5]
The mechanism driving these multi-species models is self-supervised learning. Rather than requiring human scientists to manually label thousands of hours of audio—an impossible bottleneck—the AI ingests raw acoustic data and learns the underlying distribution of the sounds. It learns which clicks, squeaks, or whistles are likely to follow one another, building a mathematical representation of the species' acoustic environment without any human-provided context.[5][7]
However, researchers acknowledge significant uncertainty regarding the gap between tokenization and true comprehension. An AI model might perfectly predict the next sound a dolphin will make, or successfully generate a synthetic whistle that prompts a response, without actually understanding the social or emotional intent behind the signal. True translation requires mapping these acoustic tokens to physical behaviors and environmental contexts, a monumental task that requires years of synchronized video and audio field data.[5][7]
The third major claim surrounding this research is that interspecies communication poses unprecedented ethical risks that require immediate regulatory action. As the technology to decode and potentially generate animal language matures, the ability to speak to animals is no longer just a scientific curiosity; it is a looming reality that carries profound moral implications.[4]
The evidence for widespread concern is documented in a first-of-its-kind global survey conducted in late 2025 by the Earth Species Project and The Collective Intelligence Project. Polling over 1,000 people across 67 countries, the data revealed deep public apprehension regarding the commercialization of the technology. A striking 85 percent of respondents stated that companies profiting from animal communication technology should face strict regulatory rules.[4]

Conservation ethicists warn that the stakes of unregulated two-way AI communication are incredibly high. If an AI can generate the perfect synthetic mating call or distress signal, it could be easily exploited by poachers to lure endangered species, or by the tourism industry to force unnatural interactions with wildlife. The public's optimism for the technology's potential is heavily tempered by the historical reality of human ecological misuse.[4][7]
Conversely, researchers argue that if deployed responsibly, these tools could revolutionize animal welfare. Understanding the specific distress vocalizations of livestock could vastly improve agricultural conditions by allowing farmers to monitor animal comfort in real-time. In the wild, AI-generated acoustic deterrents could be used to steer whales away from deadly shipping lanes or prevent train collisions with deer by broadcasting highly specific, species-understood warnings.[6][7]
The quest to decode animal communication represents a paradigm shift in how humanity interacts with the natural world. By revealing the hidden sophistication of species like the sperm whale, artificial intelligence is forcing a reevaluation of animal intelligence and our ethical obligations to other species. The challenge for the coming decade will not just be building the algorithms to listen, but developing the wisdom to handle what we might hear.[4][6]
How we got here
1971
Biologist Roger Payne publishes 'Songs of Humpback Whales', proving whales sing and sparking global conservation efforts.
2020
Project CETI is founded to apply advanced machine learning to decode sperm whale communication.
May 2024
Researchers publish the discovery of the 'sperm whale phonetic alphabet' in Nature Communications.
Jan 2025
The Earth Species Project introduces NatureLM-audio, a foundation model for animal vocalizations.
Jun 2025
Google DeepMind details DolphinGemma, an AI model aimed at decoding dolphin communication.
Viewpoints in depth
Bioacoustics Researchers
Focused on using AI to uncover the structural complexity of animal vocalizations.
This camp views AI as a revolutionary lens for biology, similar to the invention of the microscope. By applying large language models to massive datasets of animal audio, they argue we can finally bypass human perceptual biases and map the true combinatorial depth of species like sperm whales and dolphins. Their primary goal is to map the syntax of the animal kingdom before attempting direct translation.
Conservation Ethicists
Concerned with the moral implications and potential exploitation of interspecies communication.
Ethicists warn that decoding animal language opens a Pandora's box. While it could foster empathy and ecocentric policies, it also creates vectors for exploitation—such as poachers using AI to broadcast deceptive mating calls, or the commercialization of wildlife interactions. They advocate for strict regulatory frameworks before two-way communication tools are deployed in the wild.
Skeptical Linguists
Cautious about equating complex animal vocalizations with human language.
Many traditional linguists maintain that while AI has revealed impressive combinatorial structure in whale codas, structure does not automatically equal semantics. They argue that until we can map these vocalizations to abstract, displaced concepts—a hallmark of human language—we should avoid anthropomorphizing animal communication as a true 'language' equivalent to our own.
What we don't know
- The semantic meaning behind the vast majority of sperm whale codas.
- Whether animals possess abstract concepts and displacement (talking about the past or future) in their communication.
- How wild animals will react to AI-generated synthetic calls designed to mimic their species.
Key terms
- Coda
- A rhythmic sequence of clicks used by sperm whales for social communication.
- Rubato
- A musical term used by researchers to describe how whales smoothly vary the duration of their calls depending on the conversational context.
- Ornamentation
- The addition of extra clicks to a standard coda, used by whales to add complexity to their vocalizations.
- Combinatorial structure
- A system where a finite set of basic elements can be combined in various ways to create a vast number of distinct expressions.
- Bioacoustics
- The scientific study of the production, transmission, and reception of sounds by animals.
Frequently asked
What is a sperm whale coda?
A coda is a rhythmic sequence of short bursts of clicks with varying intervals that sperm whales use to communicate in the ocean.
Did AI translate what the whales are saying?
No. AI has revealed the complex grammatical structure of their calls, but the actual semantic meaning of these vocalizations remains unknown.
What is NatureLM-audio?
It is a large-scale audio language model developed by the Earth Species Project designed specifically to analyze and classify animal vocalizations across thousands of species.
Why are people worried about this technology?
Ethicists fear that two-way AI communication could be exploited for commercial gain, such as disruptive wildlife tourism or precision poaching.
Sources
[1]Nature CommunicationsBioacoustics Researchers
Contextual and combinatorial structure in sperm whale vocalisations
Read on Nature Communications →[2]MIT NewsBioacoustics Researchers
Exploring the mysterious alphabet of sperm whales
Read on MIT News →[3]National GeographicConservation Ethicists
We're one step closer to understanding the sperm whale 'alphabet'
Read on National Geographic →[4]Earth Species ProjectConservation Ethicists
What the World Thinks About AI and Animal Communication: Findings from Our First Global Survey
Read on Earth Species Project →[5]Business InsiderTech Industry Innovators
AI is learning how animals talk to each other, and could someday help humans talk to animals
Read on Business Insider →[6]Open Data ScienceBioacoustics Researchers
AI-Powered Breakthroughs in Animal Communication Open Doors to Deeper Conservation Efforts
Read on Open Data Science →[7]The Times of IndiaTech Industry Innovators
Research claims AI could eventually lead to humans communicating directly with animals
Read on The Times of India →
More in ai
See all 5 stories →On-Device AI
How Local AI Replaced the Cloud: Running Frontier Models on Your Laptop
0 sources
Enterprise AI
The Rise of Small Language Models: How Enterprises Are Running AI Locally in 2026
0 sources
Drug Discovery
New AI Model Accelerates Molecular Simulations 10,000-Fold, Slashing Drug Discovery Timelines
0 sources
Every angle. Every day.
Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.











