Position within the page tree

Institute for Natural Language Processing
Institute
Research Groups
Digital Phonetics

Department Digital Phonetics

Department Digital Phonetics, Head Prof. Dr. Thang Vu

Welcome to the Digital Phonetics research group at IMS, University of Stuttgart. The group is headed by Prof. Dr. Thang Vu since June 2018.

Research areas

Our research interests span various areas, including speech, natural language processing, machine learning, and human-machine interaction. These can be divided into the following four categories: Perception, Interaction, Learning, and Reasoning.

Among multiple senses (e.g., hearing, vision, and touch) in Perception, our research focuses on the sub-area of processing and extracting information from speech. We aim to improve the robustness and fairness of speech processing systems, such as speech recognition and speech emotion recognition, in real-world conditions, including spontaneous conversations, noisy environments, and multilingual settings (e.g., Code-switching).

Interaction research focuses on methods to teach systems which strategy to apply for intelligently reacting to user inputs (from both lexical inputs and social signals), which linguistic realization should be used (even across languages), and how to utter it (e.g., via controllable speech synthesis) to increase human acceptance. One of the main objectives of this area is to make systems adaptive in supporting users to efficiently fulfill specific goals, while also being friendly and likable by using the right words and tones.

Learning research focuses on methods to equip systems with the ability to learn quickly (e.g., using meta-learningtechniques) and to learn continually. Thus, it enables faster fine-tuning that is closer to human learning than the traditional supervised learning framework, and continually improves in the case of data distribution shifts (e.g., in fake audio detection).

Reasoning research investigates methods that give systems the capability to provide trustworthy outputs (i.e., ideally no hallucinations, bias-free) and ideally an explanation along with their decisions. We focus on identifying supporting facts rather than logical reasoning. Furthermore, we are also interested in methods that allow systems to estimate and communicate reliable uncertainty values for their decisions.

Our research group adheres to the ethical guidelines provided by various research organizations, including ISCA, ACL, and IEEE, particularly in the context of artificial intelligence (AI) systems.

Team

1. Interpretable and explainable cognitive inspired machine learning systems, Exellenzclustter SimTech PN6 - Machine Learning for Simulation (2022 - 2025)

2. Multilingual Controllable Voice Privacy (VoiPY) - DFG (2024 - 2027)

3. MEKI - Mehr erreicht mit KI - BMBF (2024 - 2027)

4. Trustworthy Chatbot for the Administration @Uni Stuttgart (2025 - )

1. KI-B^3 - KI in die berufliche Bildung bringen - BMBF (2020 - 2024)

2. Data-integrated Simulation of Human Perception and Cognition, Exellenzclustter SimTech PN6 - Machine Learning for Simulation (2019 - 2022)

3. Digital Phonetics (main focus: speech processing and dialog systems) funded by Carl-Zeiss-Stiftung (2018 - 2023)

4. Methods for Explainability in Natural Language Understanding, in collaboration with Bosch AI (2020 - 2023)

5. Truth on a Journey through Deep Waters: Attacked by DeepFake—Deep Learning to Rescue @Uni Stuttgart (2020 - 2021)

6. Language-Knowledge Interaction (responsibilities: AI Automation in Question Answering) funded by IBM (2020 - 2023)

7. Investigating the Interaction between Speech and Language Processing for Spoken Language Understanding: A Case Study for Sentiment Analysis (SFB 732 A8, 2016-2018)

8. Spoken Language Understanding funded by Sony Europe, Stuttgart Technology Center (2018 - 2021)