BioacousticsEvidence PackJun 8, 2026, 4:50 AM· 6 min read· #5 of 5 in ai

Evidence Pack: How AI is Decoding the 'Phonetic Alphabet' of Sperm Whales and Other Species

Advances in machine learning have revealed complex, combinatorial structures in animal vocalizations, moving science closer to understanding interspecies communication.

By Factlen Editorial Team

Bioacoustics Researchers 40%Conservation Ethicists 35%Tech Industry Innovators 25%
Bioacoustics Researchers
Focused on using AI to uncover the structural complexity of animal vocalizations.
Conservation Ethicists
Concerned with the moral implications and potential exploitation of interspecies communication.
Tech Industry Innovators
Applying foundation models to scale up interspecies translation efforts.

What's not represented

  • · Indigenous communities with traditional ecological knowledge of animal behavior
  • · Marine policymakers tasked with drafting regulations for AI-generated animal calls

Why this matters

Decoding animal communication could fundamentally shift humanity's relationship with nature, transforming conservation efforts from human-centric management to ecocentric stewardship while raising new ethical questions about animal welfare.

Key points

  • Machine learning has revealed that sperm whales use a complex, combinatorial 'phonetic alphabet' built from rhythm, tempo, rubato, and ornamentation.
  • While AI has mapped the structural grammar of these vocalizations, the semantic meaning of the calls remains unknown.
  • New foundation models, such as NatureLM-audio and DolphinGemma, are scaling up bioacoustics research across thousands of species.
  • A global survey indicates strong public support for interspecies research, but 85% demand strict regulations to prevent commercial exploitation.
8,719
Sperm whale codas analyzed by Project CETI
156
Distinct coda types identified
85%
Public demanding strict rules on commercial AI animal tech

For decades, the barrier between human language and animal communication seemed impenetrable. While humans evolved a vast combinatorial system of vowels and consonants to express infinite ideas, animal vocalizations were largely viewed as simple, hardcoded responses to environmental stimuli. Today, that wall is beginning to crack. Driven by the same neural network architectures that power modern large language models, artificial intelligence is doing for bioacoustics what it recently did for human text: finding the hidden, complex grammar in mountains of unstructured data.[7]

The vanguard of this interspecies translation effort centers on the sperm whale. These highly social, deep-diving mammals possess the largest brains on Earth and live in complex matriarchal societies. They communicate in the pitch-black ocean using rhythmic sequences of clicks known as codas. For years, marine biologists could only categorize a few dozen basic coda types. But by applying advanced machine learning to massive acoustic datasets, researchers have uncovered a communication system far more intricate than previously imagined.[2][3]

The central claim emerging from this new wave of bioacoustics research is that sperm whales utilize a combinatorial phonetic alphabet. The prevailing scientific consensus has shifted from viewing whale codas as rigid, holistic signals to recognizing them as modular constructs. Researchers now argue that sperm whales build a vast repertoire of distinct vocalizations by combining a finite set of acoustic building blocks, much like humans combine vowels and consonants.[1][2]

The foundational evidence for this phonetic alphabet stems from a landmark study published in Nature Communications by researchers from Project CETI and MIT's Computer Science and Artificial Intelligence Laboratory. The team analyzed a massive dataset of 8,719 codas recorded over a decade from the sperm whale families of the Eastern Caribbean clan. By feeding this audio into machine learning algorithms designed to detect latent patterns, the researchers successfully mapped the full structural variance of the clan's communications.[1][2]

The mechanism behind this communication relies on four distinct acoustic features. The AI revealed that the whales' coda repertoire is built from two context-independent features, rhythm and tempo, alongside two context-sensitive features, rubato and ornamentation. Rubato involves smoothly varying the duration of the calls, while ornamentation refers to adding extra clicks to a standard sequence. Sperm whales freely combine these four features to generate an enormous inventory of distinguishable codas.[1][2]

Sperm whales build their codas using four distinct acoustic features, allowing for a vast inventory of distinguishable calls.
Sperm whales build their codas using four distinct acoustic features, allowing for a vast inventory of distinguishable calls.

Further evidence of this mechanism's complexity is found in the whales' conversational dynamics. The algorithms detected remarkable precision in how these features are deployed during social exchanges. Sperm whales make sub-second adjustments to their rubato and ornamentation to match one another, effectively taking turns and modulating their calls based on the immediate social context. This level of systematic, combinatorial vocalization was previously thought to be uniquely human.[2][3]

Despite these structural breakthroughs, transparent uncertainty remains regarding the actual meaning of the codas. While the evidence for structural complexity is robust, the semantic meaning of this alphabet remains entirely opaque. AI has successfully mapped the syntax of sperm whale communication, but it has not provided a dictionary. Researchers know exactly how the whales construct their codas, but they do not yet know what they are saying—whether they are discussing foraging strategies, negotiating social hierarchies, or simply announcing their presence.[1][3]

Despite these structural breakthroughs, transparent uncertainty remains regarding the actual meaning of the codas.

A second major claim in the field is that AI foundation models can now generalize to decode multiple non-human species simultaneously. The breakthrough with sperm whales is not an isolated incident. The bioacoustics field is rapidly moving toward universal foundation models—AI systems trained on vast amounts of multi-species audio that can classify and generate animal vocalizations across the entire animal kingdom without needing a bespoke model for every new animal.[4][6]

The primary evidence for this cross-species generalization came in 2025, when the nonprofit Earth Species Project released NatureLM-audio, the first large-scale audio language model tailored specifically for animals. In rigorous testing, the model demonstrated the ability to accurately analyze calls, count individuals, and distinguish traits such as sex and life stage in species ranging from zebra finches to marine mammals. The model proved that AI can scale its pattern-recognition capabilities across dramatically different evolutionary lineages.[4][6]

Foundation models like NatureLM-audio have successfully analyzed the vocalizations of species ranging from marine mammals to zebra finches.
Foundation models like NatureLM-audio have successfully analyzed the vocalizations of species ranging from marine mammals to zebra finches.

Additional evidence comes from the tech industry's entry into bioacoustics. Google DeepMind recently detailed its DolphinGemma project, a large language model developed in collaboration with Georgia Tech and the Wild Dolphin Project. DolphinGemma ingests raw dolphin audio, separates the sounds, and tokenizes them into a format that a large language model can process. The ultimate goal of the project is to generate synthetic sounds to communicate back to the pod in real-time.[5]

The mechanism driving these multi-species models is self-supervised learning. Rather than requiring human scientists to manually label thousands of hours of audio—an impossible bottleneck—the AI ingests raw acoustic data and learns the underlying distribution of the sounds. It learns which clicks, squeaks, or whistles are likely to follow one another, building a mathematical representation of the species' acoustic environment without any human-provided context.[5][7]

However, researchers acknowledge significant uncertainty regarding the gap between tokenization and true comprehension. An AI model might perfectly predict the next sound a dolphin will make, or successfully generate a synthetic whistle that prompts a response, without actually understanding the social or emotional intent behind the signal. True translation requires mapping these acoustic tokens to physical behaviors and environmental contexts, a monumental task that requires years of synchronized video and audio field data.[5][7]

The third major claim surrounding this research is that interspecies communication poses unprecedented ethical risks that require immediate regulatory action. As the technology to decode and potentially generate animal language matures, the ability to speak to animals is no longer just a scientific curiosity; it is a looming reality that carries profound moral implications.[4]

The evidence for widespread concern is documented in a first-of-its-kind global survey conducted in late 2025 by the Earth Species Project and The Collective Intelligence Project. Polling over 1,000 people across 67 countries, the data revealed deep public apprehension regarding the commercialization of the technology. A striking 85 percent of respondents stated that companies profiting from animal communication technology should face strict regulatory rules.[4]

A 2025 global survey revealed strong public demand for ethical regulations on interspecies communication technology.
A 2025 global survey revealed strong public demand for ethical regulations on interspecies communication technology.

Conservation ethicists warn that the stakes of unregulated two-way AI communication are incredibly high. If an AI can generate the perfect synthetic mating call or distress signal, it could be easily exploited by poachers to lure endangered species, or by the tourism industry to force unnatural interactions with wildlife. The public's optimism for the technology's potential is heavily tempered by the historical reality of human ecological misuse.[4][7]

Conversely, researchers argue that if deployed responsibly, these tools could revolutionize animal welfare. Understanding the specific distress vocalizations of livestock could vastly improve agricultural conditions by allowing farmers to monitor animal comfort in real-time. In the wild, AI-generated acoustic deterrents could be used to steer whales away from deadly shipping lanes or prevent train collisions with deer by broadcasting highly specific, species-understood warnings.[6][7]

The quest to decode animal communication represents a paradigm shift in how humanity interacts with the natural world. By revealing the hidden sophistication of species like the sperm whale, artificial intelligence is forcing a reevaluation of animal intelligence and our ethical obligations to other species. The challenge for the coming decade will not just be building the algorithms to listen, but developing the wisdom to handle what we might hear.[4][6]

How we got here

  1. 1971

    Biologist Roger Payne publishes 'Songs of Humpback Whales', proving whales sing and sparking global conservation efforts.

  2. 2020

    Project CETI is founded to apply advanced machine learning to decode sperm whale communication.

  3. May 2024

    Researchers publish the discovery of the 'sperm whale phonetic alphabet' in Nature Communications.

  4. Jan 2025

    The Earth Species Project introduces NatureLM-audio, a foundation model for animal vocalizations.

  5. Jun 2025

    Google DeepMind details DolphinGemma, an AI model aimed at decoding dolphin communication.

Viewpoints in depth

Bioacoustics Researchers

Focused on using AI to uncover the structural complexity of animal vocalizations.

This camp views AI as a revolutionary lens for biology, similar to the invention of the microscope. By applying large language models to massive datasets of animal audio, they argue we can finally bypass human perceptual biases and map the true combinatorial depth of species like sperm whales and dolphins. Their primary goal is to map the syntax of the animal kingdom before attempting direct translation.

Conservation Ethicists

Concerned with the moral implications and potential exploitation of interspecies communication.

Ethicists warn that decoding animal language opens a Pandora's box. While it could foster empathy and ecocentric policies, it also creates vectors for exploitation—such as poachers using AI to broadcast deceptive mating calls, or the commercialization of wildlife interactions. They advocate for strict regulatory frameworks before two-way communication tools are deployed in the wild.

Skeptical Linguists

Cautious about equating complex animal vocalizations with human language.

Many traditional linguists maintain that while AI has revealed impressive combinatorial structure in whale codas, structure does not automatically equal semantics. They argue that until we can map these vocalizations to abstract, displaced concepts—a hallmark of human language—we should avoid anthropomorphizing animal communication as a true 'language' equivalent to our own.

What we don't know

  • The semantic meaning behind the vast majority of sperm whale codas.
  • Whether animals possess abstract concepts and displacement (talking about the past or future) in their communication.
  • How wild animals will react to AI-generated synthetic calls designed to mimic their species.

Key terms

Coda
A rhythmic sequence of clicks used by sperm whales for social communication.
Rubato
A musical term used by researchers to describe how whales smoothly vary the duration of their calls depending on the conversational context.
Ornamentation
The addition of extra clicks to a standard coda, used by whales to add complexity to their vocalizations.
Combinatorial structure
A system where a finite set of basic elements can be combined in various ways to create a vast number of distinct expressions.
Bioacoustics
The scientific study of the production, transmission, and reception of sounds by animals.

Frequently asked

What is a sperm whale coda?

A coda is a rhythmic sequence of short bursts of clicks with varying intervals that sperm whales use to communicate in the ocean.

Did AI translate what the whales are saying?

No. AI has revealed the complex grammatical structure of their calls, but the actual semantic meaning of these vocalizations remains unknown.

What is NatureLM-audio?

It is a large-scale audio language model developed by the Earth Species Project designed specifically to analyze and classify animal vocalizations across thousands of species.

Why are people worried about this technology?

Ethicists fear that two-way AI communication could be exploited for commercial gain, such as disruptive wildlife tourism or precision poaching.

Sources

Source coverage

7 outlets

3 viewpoints surfaced

Bioacoustics Researchers 40%Conservation Ethicists 35%Tech Industry Innovators 25%
  1. [1]Nature CommunicationsBioacoustics Researchers

    Contextual and combinatorial structure in sperm whale vocalisations

    Read on Nature Communications
  2. [2]MIT NewsBioacoustics Researchers

    Exploring the mysterious alphabet of sperm whales

    Read on MIT News
  3. [3]National GeographicConservation Ethicists

    We're one step closer to understanding the sperm whale 'alphabet'

    Read on National Geographic
  4. [4]Earth Species ProjectConservation Ethicists

    What the World Thinks About AI and Animal Communication: Findings from Our First Global Survey

    Read on Earth Species Project
  5. [5]Business InsiderTech Industry Innovators

    AI is learning how animals talk to each other, and could someday help humans talk to animals

    Read on Business Insider
  6. [6]Open Data ScienceBioacoustics Researchers

    AI-Powered Breakthroughs in Animal Communication Open Doors to Deeper Conservation Efforts

    Read on Open Data Science
  7. [7]The Times of IndiaTech Industry Innovators

    Research claims AI could eventually lead to humans communicating directly with animals

    Read on The Times of India
Stay informed

Every angle. Every day.

Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.