The Covid-19 pandemic has enormously boosted the popularity of video communication – but sometimes, poor transmission quality, dropouts, and connection failures during meetings or conference calls tax the participants’ patience. Researchers at Karlsruhe Institute of Technology (KIT) and Carnegie Mellon University (CMU) have developed a method for transmitting video conferences over very low bandwidth connections, enabling such transmissions even under extreme conditions. It was now tested during a dive to the wreck of the Titanic, lies at a depth of nearly 4,000 meters in the North Atlantic.
“Transmitting data from a depth of four kilometers through salt water without any loss is extremely difficult,” says Professor Alex Waibel, who conducts research on speech translation at KIT and CMU, describing the challenge. Natural conditions allow sonar transmission from the submersible to the mother ship at sea-surface level only, since radio communication does not work in salt water. The researchers have developed synthetic methods to convert video data into text. The sound recording is first converted to text in the submersible and then transmitted to the surface by sonar sound pulses, where the video is reconstructed from the text. “The video then features a synthetic voice that is mapped to the voice of the person who is speaking, so that it sounds like the voice of that person. In addition, the video synthesis is controlled in such a way that the lips of the speaker move in sync with the sound,” explains Waibel, who has been doing research in speech recognition, speech processing, and speech translation for decades.
Method Facilitates Video Conferencing at Very Low Bandwidth
The researchers in the submersible used a powerful laptop computer that first converted the speech of different speakers who took part in the dialogue into text. Selected text fragments can then be sent to the surface via sonar. There, the text is converted back to video. A novelty is the conversion of a synthetic, neutral voice into the individual voices of the respective speakers, as well as the video synthesis, which creates a lip synchronization of the video showing the respective speakers. This method allows video conferences to be transmitted over low bandwidth connections: “In the future, this will facilitate remote communicate in spoken language,” says Waibel. However, it is also suitable for synthesizing videos in a different language or for lip-syncing videos.
Technology Enabled by Pioneering Work on Speech Translation at KIT
The technology tested by Waibel on the wreck of the Titanic builds on decades of pioneering work in speech translation. Waibel’s developments include the “Lecture Translator” in use at KIT to automatically record the lecturer’s speech in lectures and translate the speech signals simultaneously into written English text. This means that students can follow the lecture on their laptop, smartphone, or tablet.
Being “The Research University in the Helmholtz Association”, KIT creates and imparts knowledge for the society and the environment. It is the objective to make significant contributions to the global challenges in the fields of energy, mobility, and information. For this, about 10,000 employees cooperate in a broad range of disciplines in natural sciences, engineering sciences, economics, and the humanities and social sciences. KIT prepares its 22,800 students for responsible tasks in society, industry, and science by offering research-based study programs. Innovation efforts at KIT build a bridge between important scientific findings and their application for the benefit of society, economic prosperity, and the preservation of our natural basis of life. KIT is one of the German universities of excellence.