Over the last few decades, speech-to-speech (S2S) translation technologies have been developed with the goal of easing communication between people who speak different languages. S2S technology is significant because it allows speakers of a variety of languages from around the world to communicate, bridging the language gap in global commerce and cross-cultural communication. Recent predictions have identified speech to speech translation as one of the top 10 technologies that will transform the world.
Numerous efforts have been made to construct S2S translation models. Below is a list of the top S2S translation Models.
The IBM MASTOR acronym stands for Multilingual Automatic S2S Translator. It is being developed for the DARPA CAST, whose aim is to develop technologies that enable fast deployment of real-time S2S translation on mobile devices for low-resource languages. The MASTOR system is composed of three components: Automatic Speech Recognition (ASR), Machine Translations (MT), and Text-To-Speech synthesis (TTS).
Sign up for your weekly dose of what's up in emerging technology.
This pipelining strategy enables the system to utilise standard speech and language distribution techniques while also addressing the particular issues associated with Speech-to-Speech Translation.
Verbmobil is a two-way, speaker-independent S2S translation system. It is used in mobile contexts to translate spontaneous dialogues. It begins by identifying the input, then analyses and translates it further before delivering the final translation. This is a multilingual system that manages the delivery of dialogues in three business-oriented areas where the translation between three languages (German, English, and Japanese) is context-dependent.
Google Translate is the most user-friendly online text translator. Initially, the network used Statistical Machine Translation (SMT) to translate text pairings derived from documents and transcripts of the United Nations and European Parliament. It was unable to implement grammar since SMT used predictive translation. Finally, a sentence-level neural machine translation (NMT) model system was constructed.
Google Translate now supports 109 languages. There are three steps involved in the translation process. The network model searches millions of pages for language patterns. Google Translate is the best translation model despite various language-specific objections. This project also uses Google Translate API, a free Python translation tool.
Moses is a free, open-source translation system that employs an Encoder-Decoder network. The network can be used to train a model for any language pair. To train the model, it requires a collection of training pairings, after which it looks for the best possible translation. To eliminate misalignment, the training words and phrases are word-aligned using heuristics. The tuning process is applied to the decoder’s final output.
Several statistical models are weighed against one another during the tuning process to determine the optimum translation model based on various scores. As a result, the word or phrase translations violate grammar. Additionally, Moses does not yet have an end-to-end architecture for speech to speech translations as of September 2020.
Microsoft Translator is a cloud-based translation service that can be utilised for personal or business purposes. It features a cloud-based REST API for speech translations that may be used to develop language translation functionality for websites or mobile apps. Microsoft Translator API’s default translation method is Neural Machine Translation. Bing Translator, formerly known as Microsoft Translator, is a website and text translation service provider.
Skype Translator, which leverages Microsoft’s Statistical Machine Translation engine, delivers an end-to-end stand-alone speech-to-speech translation service as a mobile and desktop application. Additionally, Skype translator provides real-time translation of direct text messages into over 70 languages.
Translatotron is a Google Research-funded translation service. The single sequence-to-sequence architecture, according to the tech giant, is the first end-to-end framework to directly convert speech from one language into speech in another. The technique was used to generate synthesised translations of voices, ensuring that the original speaker’s sound remained preserved. However, this feature could be abused to manufacture speech in a different voice and create deep fake sounds.
This month, Google researchers published a paper explaining the development of ‘Translatotron 2,’ an improved version that addresses the deep fake difficulties. Additionally, the newer model surpassed the Translatoron in terms of translation quality and predicted speech naturalness by a “significant margin.” Additionally, it increased robustness by reducing chattering and lengthy pauses.
The first network-based S2S translation system on a handheld device was introduced in 2009, demonstrating the feasibility of the project. The subsequent phase will incorporate additional languages from the Middle East and Europe. Standards for network communication have been drafted. The Consortium successfully launched “VoiceTra4u-M” in London in advance of the 2012 Olympic Games. Current technology must be optimised in the future. The next milestones for this collaboration will be to collaborate with other countries and languages, to share research activities across multilingual communities, and to make it accessible and usable to the intended audience.