How AI Foundation Models Are Decoding the Languages of Whales, Crows, and Elephants
Advanced machine learning models are mapping the complex syntax of animal communication, revealing phonetic alphabets and challenging the legal boundaries of human exceptionalism.
By Factlen Editorial Team
- Bioacoustics Researchers
- Focused on using machine learning to map the structural syntax of animal calls and understand their complex social hierarchies.
- Conservationists & Legal Scholars
- Argue that proving animals possess language should fundamentally alter environmental law and grant non-human species legal standing.
- AI Technologists
- Focused on the engineering challenges of self-supervised learning and bridging the sensory 'domain gap' between humans and animals.
What's not represented
- · Indigenous communities with traditional ecological knowledge
- · Commercial marine industries facing potential new regulations
Why this matters
By translating the complex languages of other species, AI is dismantling the idea of human exceptionalism. This breakthrough could fundamentally rewrite environmental law, granting legal rights to animals based on their proven capacity for language and culture.
Key points
- AI foundation models are bypassing manual human tagging to map the complex syntax of animal communication.
- The Earth Species Project's NatureLM-audio model is identifying universal communication patterns across diverse species.
- Project CETI has used natural language processing to discover a highly structured 'phonetic alphabet' used by sperm whales.
- Harvard engineers have developed open-source bio-loggers to capture the physical context needed to train these AI models.
- Legal scholars argue that proving animals possess language could fundamentally rewrite environmental law and grant them legal standing.
For centuries, the ability to understand the intricate languages of the animal kingdom has been relegated to the realm of mythology and children's literature. Human exceptionalism—the belief that our species alone possesses the cognitive machinery for complex, structured language—has shaped our relationship with the natural world. But the rapid acceleration of artificial intelligence is fundamentally rewriting that narrative. By deploying the same underlying transformer architectures that power modern large language models, scientists are now decoding the vocalizations of whales, crows, and elephants. This is not a parlor trick of simple pattern matching; it is the systematic mapping of alien syntax. AI is throwing open the aperture of human perception, allowing us to listen to the Earth's diverse intelligences on their own terms and potentially bridging the deepest communication divide on the planet.[7]
The traditional bottleneck in bioacoustics has always been human limitation. For decades, field researchers spent thousands of hours manually listening to recordings, attempting to tag and categorize animal sounds by ear. This painstaking process meant that only a fraction of collected data was ever analyzed, and subtle, high-dimensional acoustic patterns invisible to human cognition were entirely missed. Today, researchers are bypassing manual annotation entirely through self-supervised learning. By feeding massive, unlabeled datasets of animal vocalizations into neural networks, these models learn the underlying structural relationships of the sounds without needing a human to define what those sounds mean. The AI maps the acoustic space, clustering similar calls and identifying grammatical rules that govern how sequences are constructed.[1]
At the forefront of this terrestrial effort is the Earth Species Project, a non-profit organization building foundation models explicitly designed for non-human communication. Their flagship model, NatureLM-audio, represents a paradigm shift in how we approach bioacoustics. Unlike previous algorithms that were narrowly trained on a single species, NatureLM-audio is trained on a vast, multimodal corpus that includes human speech, music, and a wide spectrum of environmental and animal noise. By absorbing this diverse acoustic data, the model learns universal patterns of communication across taxa. Early results indicate that the structural lessons the AI learns from human language can actually accelerate its ability to parse the vocalizations of entirely different species, suggesting a shared mathematical architecture to communication across the tree of life.[1]
The practical application of these foundation models is already yielding unprecedented insights in the field. In the dense forests of northern Spain, the Earth Species Project has partnered with biologists studying carrion crows. These highly intelligent birds engage in cooperative breeding, a complex social structure where extended families work together to raise chicks and defend territory. Coordinating this collective behavior requires a nuanced, highly specific communication network. For years, researchers deployed audio recorders to capture these interactions, but the sheer volume of data quickly overwhelmed their capacity to analyze it. A single microphone would capture a week's worth of continuous audio, creating a backlog of noise that would take lifetimes to manually decode.[1][5]

By applying custom AI models to this acoustic mountain, the research team achieved a breakthrough. The algorithms successfully isolated and categorized more than 127,000 distinct crow vocalizations, mapping a communication network far more intricate than previously understood. The AI identified specific call types used for individual recognition, territorial defense, and coordinated foraging. This scale of analysis allows scientists to move beyond asking whether the crows are communicating, to understanding exactly how information flows through their social hierarchy. It provides a real-time window into the inner lives of a cooperative species, proving that advanced AI can untangle the dense web of terrestrial animal societies.[5]
While terrestrial models map the forests, the most ambitious interspecies communication project is focused on the deep ocean. Project CETI is an international, multi-disciplinary initiative dedicated entirely to decoding the language of the sperm whale. The choice of species is highly intentional. Sperm whales possess the largest brains in the history of the planet, featuring a neocortex heavily optimized for acoustic processing. They live in complex, multi-generational matrilineal societies, and they communicate across vast oceanic distances using rhythmic sequences of clicks known as codas. Because their communication is entirely acoustic and highly structured, it presents the perfect dataset for advanced machine learning models to analyze.[2]
For decades, the scientific consensus viewed these sperm whale clicks through a reductive lens. Dating back to the 1960s, marine biologists largely treated the codas as a form of aquatic Morse code—simple, repetitive signals used primarily for echolocation and basic identification. The prevailing theory was that the clicks contained more signal than speech, lacking the combinatorial complexity required for true language. However, Project CETI's application of advanced natural language processing has thoroughly dismantled this assumption. By feeding thousands of recorded codas into AI models, researchers discovered that the whales were subtly varying the tempo, rhythm, and added 'ornamentation' of their clicks in highly predictable, rule-bound ways.[2]
For decades, the scientific consensus viewed these sperm whale clicks through a reductive lens.
This revelation culminated in a landmark study published in Nature Communications, where researchers officially detailed the 'sperm whale phonetic alphabet.' The AI identified 156 distinct codas, demonstrating that the whales combine basic acoustic components—much like human phonemes—to construct complex phrases. The models revealed that these codas are not static; they are dynamically altered depending on the conversational context and the specific individuals involved in the exchange. This combinatorial structure is a hallmark of advanced language, proving that sperm whales possess a systemic communication framework capable of expressing a vast array of distinct meanings.[2]

Gathering the pristine data required to train these models is an immense engineering challenge. To solve this, Harvard University engineers collaborating with Project CETI developed a state-of-the-art, open-source bio-logger. Designing a device that can safely adhere to a massive, deep-diving marine mammal without causing harm or altering its behavior required radical innovation. The team engineered specialized suction cups inspired by the anatomy of the clingfish, allowing the bio-loggers to stay firmly attached to the whale's textured skin even as it plunges into the crushing pressures of the deep ocean. These devices represent a massive leap forward in non-invasive marine observation.[3]
The Harvard bio-loggers do much more than just record audio; they capture the vital physical context necessary for machine learning. The devices record high-fidelity, multi-channel sound while simultaneously tracking the whale's depth, velocity, three-dimensional orientation, and proximity to other members of the pod. This contextual data is crucial because language does not exist in a vacuum. To eventually understand what a specific coda means, the AI must correlate the acoustic signal with the physical behavior occurring at that exact moment. By explicitly designing the hardware to feed structured, multi-modal data directly into neural networks, the team has created the ultimate training ground for interspecies AI.[3]

The technology is now advancing from passive listening to generative prediction. Project CETI researchers have developed models like WhAM, an AI originally trained to generate music, which has been fine-tuned to process and predict sperm whale clicks. In demonstrations, the model can take an entirely foreign audio input—like a human snapping their fingers or speaking—and instantly translate it into a predicted sequence of whale codas. While these generated clicks do not yet carry verified semantic meaning, the architecture proves that AI can successfully map the acoustic rules of a non-human language well enough to 'speak' it back with perfect grammatical structure.[2]
The implications of these foundation models are rippling across the broader academic landscape, fundamentally expanding the boundaries of linguistics. At UC Berkeley, researchers are leveraging these exact AI tools to study the communication systems of jumping spiders, honeybees, and elephants. Historically, linguistics was a discipline strictly confined to human speech, but AI is transforming it into a universal science of communication. By treating animal vocalizations and physical gestures as formal languages, linguists are using machine learning to uncover how infant animals learn their first 'words' and how different species construct their own unique versions of syntax and grammar.[4]
Despite these breathtaking advances, a profound barrier remains: the 'domain gap.' AI excels at mapping syntax—the structural rules of how sounds are arranged—but semantics, the actual meaning behind the sounds, is vastly more difficult to decode. Human language is grounded in human experience; we have words for trees, fire, and the sky. A sperm whale's lived experience is defined by deep-ocean pressure, total darkness, and the sensory reality of echolocation. Even if an AI perfectly maps their grammar, translating their concepts into human terms may be impossible because we simply do not share their physical reality. The true frontier of this research lies in deciphering the parts of their world that have no human equivalent.[7]
As the science accelerates, the legal and philosophical stakes are coming into sharp focus. In 2025, the NYU More-Than-Human Life (MOTH) Program partnered with Project CETI to publish a groundbreaking legal framework exploring the consequences of AI-assisted animal translation. If science definitively proves that cetaceans and other animals possess complex language, culture, and individual names, it shatters the legal justification for treating them as mere property or resources. The paper argues that recognizing animal language could disrupt the entire legal landscape, forcing courts and policymakers to reevaluate the fundamental rights of non-human species.[6]

This interdisciplinary collision of AI, bioacoustics, and law opens the door to unprecedented environmental protections. Legal scholars are actively exploring mechanisms where AI-translated animal communication could be used as a form of 'testimony' in environmental courts. If a foundation model can definitively prove that a pod of whales is communicating distress due to commercial shipping noise or deep-sea mining, that data could be leveraged to enforce habitat protections. The ability to empirically demonstrate the cognitive and cultural richness of these animals provides a powerful new weapon for conservationists fighting to preserve global biodiversity.[6]
Ultimately, the quest to decode animal communication is about much more than scientific curiosity; it is a profound test of human empathy. For generations, the extract-use-discard model of human industry has driven the planet toward ecological collapse, largely because we viewed the natural world as silent and unfeeling. By using our most advanced technology to finally listen, we have the opportunity to spark a systemic shift from domination to dialogue. As these foundation models continue to scale, they offer humanity a rare chance to step back from the center of the universe and recognize that we are surrounded by diverse, intelligent cultures that have been speaking to us all along.[1][2][7]
How we got here
1960s–2010s
Marine biologists primarily view sperm whale codas as simple, repetitive signals akin to Morse code.
2020
Project CETI is founded to apply advanced machine learning and natural language processing to sperm whale communication.
2024
Researchers publish a landmark study detailing a 'sperm whale phonetic alphabet' with 156 distinct codas.
2025
The Earth Species Project releases NatureLM-audio, a foundation model trained across human and animal vocalizations.
April 2025
The NYU MOTH Program publishes a legal framework exploring the impact of AI-assisted animal translation on environmental law.
Viewpoints in depth
Bioacoustics Researchers
Mapping the social networks of the animal kingdom.
For field biologists, the integration of AI is less about having a 'conversation' with animals and more about understanding their social architecture. By categorizing hundreds of thousands of vocalizations, researchers can map how information flows through a crow family or a whale pod. This data reveals who is leading the group, how they negotiate conflict, and how they pass down cultural knowledge across generations, fundamentally shifting biology from observational to analytical.
Conservationists & Legal Scholars
Redefining legal rights through the lens of language.
Legal experts view AI translation as a wedge to break human exceptionalism in the courtroom. If a foundation model can empirically prove that a species possesses a phonetic alphabet, individual names, and cultural dialects, it becomes legally indefensible to classify them merely as property or natural resources. Scholars are laying the groundwork for a future where AI-decoded distress calls could serve as actionable evidence to halt commercial activities that destroy habitats.
AI Technologists
Solving the ultimate unsupervised learning problem.
For computer scientists, animal communication represents the purest test of self-supervised learning. Unlike human language models, which are trained on billions of words of written text with clear semantic meanings, animal AI must build a language model from scratch using only raw audio and movement data. Technologists are focused on overcoming the 'domain gap'—the reality that even if an AI perfectly maps a whale's grammar, translating concepts rooted in echolocation into human terms may require entirely new mathematical frameworks.
What we don't know
- Whether mapping the syntax of animal communication will ever allow us to truly understand their semantic meaning.
- How courts and international law will actually respond to empirical evidence of non-human language.
- If generative AI models producing animal sounds could inadvertently disrupt natural ecosystems or social structures.
Key terms
- Bioacoustics
- The cross-disciplinary science that combines biology and acoustics to study how animals produce and perceive sound.
- Foundation Model
- A large-scale AI model trained on vast amounts of unlabeled data that can be adapted to a wide range of downstream tasks.
- Coda
- A patterned, rhythmic series of clicks used by sperm whales to communicate across long distances.
- Self-Supervised Learning
- An AI training method where the model learns underlying patterns and structures in data without needing humans to label every example.
- Domain Gap
- The fundamental difference in lived experience and sensory perception between humans and other species, which makes direct semantic translation difficult.
Frequently asked
Will we be able to talk back to animals?
Projects like CETI are developing generative models that can produce grammatically correct animal sounds. However, meaningful two-way dialogue remains a distant goal due to the vast differences in our sensory experiences and environments.
How does AI learn animal language without a dictionary?
Using self-supervised learning, AI maps the structural relationships between sounds, similar to how it learns human language grammar. It clusters patterns and identifies rules without initially knowing what the specific 'words' mean.
Why are researchers focusing on sperm whales?
Sperm whales have the largest brains on Earth, live in complex, multi-generational societies, and communicate using rhythmic acoustic clicks that are highly structured and ideal for machine learning analysis.
Sources
[1]Earth Species ProjectConservationists & Legal Scholars
The Next Frontier of Understanding Life on Earth
Read on Earth Species Project →[2]Project CETIAI Technologists
Applying advanced machine learning to translate the communication of sperm whales
Read on Project CETI →[3]Harvard UniversityAI Technologists
Listening to whales: Harvard engineers build open-source bio-logger for Project CETI
Read on Harvard University →[4]UC BerkeleyBioacoustics Researchers
With AI and linguistics, this professor is decoding how animals and humans communicate
Read on UC Berkeley →[5]MongabayBioacoustics Researchers
From caws to code: AI helps decrypt animal communication
Read on Mongabay →[6]NYU MOTH ProgramConservationists & Legal Scholars
What if We Understood what Animals are Saying? The Legal Impact of AI-assisted Studies of Animal Communication
Read on NYU MOTH Program →[7]Factlen Editorial TeamAI Technologists
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
More in ai
See all 5 stories →State Space Models
How the 'Mamba' Architecture and State Space Models Are Solving AI's Quadratic Bottleneck
6 sources
Context Windows
Explainer: How 'Million-Token Context Windows' Transformed AI from Chatbots to Instant Analysts
6 sources
AI Architecture
Explainer: How 'Mixture of Experts' Became the Secret Engine Powering Modern AI
6 sources
Every angle. Every day.
Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.











