Listen to this story
Animals communicating with each other might seem simplistic at first glance. Compared to human communication, animals do not appear to be using any particular language but merely noises to communicate with each other. Several noises that animals make are less of a conversation in the present, and more of a call for predicting natural changes such as rain, water, or signals for food some distance away.
When it comes to artificial intelligence, plenty of progress has been made in the development of AGI using machine learning and neural networks on animals and through the understanding of animal behaviour. However, understanding the language of animals and communicating with them is one of the longest-running fields of study in technology and biological sciences alike.
Recently, California-based organisation, Earth Species Project (ESP), introduced the Bioacoustic Cocktail Party Problem Network (BioCPPNet) that uses machine learning to decode non-human communication. The machine learning architecture is a modular, U-Net-based network that optimises bioacoustic source separation in diverse biological taxa.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
You can find the link to the code here.
What is Bioacoustic source separation?
Informally referred to as the “cocktail party problem”, Bioacoustic source separation encompasses the detecting, recognising, and extracting information problem from specific signals in the presence of noisy environments. While separating human speech is a well studied subject with the use of deep neural networks (DNNs), bioacoustic CPP in animal environments remains problematic due to an overlap of noises from different unidentifiable sources.
BioCPPNet, the machine learning model, is a lightweight neural network that acts as an end-to-end source separation system and extracts information from raw waveforms obtained directly from recordings to identify and reconstruct sources. Extracting information from a herd of animals is a difficult task. For example, 58% of vocal recordings of an African elephant consisted of two concurrent signals which were hard to separate.
Aza Raskin, founder of ESP, said that the idea of BioCPPNet originates from the recent advancement in machine learning models that has made it possible to translate between distant human languages in real-time without any prior knowledge requirement.
How does it work?
Recently, Elodie Briefer, associate professor at University of Copenhagen, has developed a pig-grunt analysing algorithm that helps in assessing positive or negative emotions in pigs. Though a great development, the algorithm only worked on pigs and failed to analyse other animals like dolphins, primates, or bees’ communication.
Raskin says that their network aims to understand the entire biodiverse ecosystem’s communication. The model was tested on macaques, Egyptian fruit bats, and bottlenose dolphins. The supervised model performed two tasks—to group sequences of input signals from different sources and to integrate simultaneous harmonic or quasi-harmonic sounds by a given signaller.
ESP used a CNN-based classifier model in BioCPPNet to label the individual identity of vocalised signals. Using the data already available from previous studies, the aim was to apply a self-supervised machine learning algorithm to relate the physical animal behaviour and actions with the audio data to verify if they could be tied together.
The algorithm worked best in a closed speaker regime with testing subjects drawn from the same distribution of training subsets. In case of bottlenose dolphins and macaques, the system struggled in an open speaker regime with the large testing data being different from smaller training data. However, in case of bats, the model yielded comparable results in both open and closed regimes suggesting the need for larger datasets.
Can we talk to animals?
Since the model has to be implemented on larger datasets of the animal environment, Raskin says that the method could benefit by reducing the supervised training scheme. This poses a limitation since the models worked best with larger training data.
Raskin points to ongoing studies that are applying CNN and developing a self-supervised machine learning algorithm, without the requirement of human experts to label and input data. Christian Rutz, professor of biology at University of St Andrews said that Hawaiin crows, the species that makes and uses tools for foraging, are believed to have a more complex set of vocalisations than other crow species. Another study by Ari Friedlaender of University of California, uses data from sound recorders placed inside the ocean to observe behavioural patterns of marine animals.
Robert Seyfarth, professor of Psychology at University of Pennsylvania, points out the problem of inferring meaning from animal sounds. He argues that the same sound can have different meanings in different contexts when it comes to animals. “Applying AI analyses to human language, with which we are so intimately familiar, is one thing, but it can be different and difficult doing it to other species,” said Seyfarth.
Raskin acknowledges the concern and says that AI alone cannot unlock communication with other species but researches have showcased how complex animal languages are than merely noises and actions. This research opens the gate to previously unusable large datasets of overlapping signals and enables researchers to implement ML-based models to design management and conservation strategies for animal species.