SEAMLESSM4T: Meta Artificial Intelligence Model translates speeches from one language to another recognizing 101 languages

A model that can translate the speech directly from one language to another created a research team of American technology company Meta. Most existing translation systems with mechanical learning are text-oriented or include multiple steps, i.e. speech recognition, text translation and translation of text into speech. CORVERSE In addition, language coverage in existing speech-to-voice models falls short of text-to-text models. In trying to address these limitations the new model, called , makes direct translations for up to 101 languages and can open the way for quick translations, according to . Specifically it can make translation from speech to speech recognizing 101 languages and translating into 36, translation from speech to text (101 languages to 96), translation from text to speech (96 languages to 36), translation from text to text (96 languages) and automatic recognition of speech (96 languages). CORVERSE According to the research team, for translation from speech to speech, SEAMLESSM4T translates up to 23% more precision than existing systems. In an accompanying article commenting on research in the same journal, the deputy professor at Tallinn University in Estonia, Tanel Alume, notes that the greatest virtue of this model is the fact that all data and code for the execution and optimization of technology are publicly available. However, he sees that certain obstacles remain, such as limited language translation or difficulty translating conversations into noisy places or among people with an intense accent, something that human translators handle more easily. Alison Kenneke, an assistant professor at Cornell University’s Department of Informatics Science, distinguishes as very interesting the fact that researchers quantified the toxic, harmful or offensive language that translations may introduce and sought any bias due to gender that the model may produce in translations. “Although speech technologies can be more effective and cost effective in decognition and translation than people (also prone to prejudice and errors), it is imperative to understand the ways in which these technologies fail, disproportionately for certain demographic elements,” he notes.