PROGRAM 2

Speech & Language Intelligence

RP2-1

AI-driven human-computer Conversational Interfaces

Speech and Language Intelligence Laboratory

The research in CPII’s Speech and Language Intelligence Program aims to enhance human-machine interactions and machine-mediated human-human interactions through advanced speech and language processing technologies. First, we have developed machine speech perception technologies, namely, an automatic speech recognition (ASR) system for the local Hong Kong Cantonese dialect, based on a state-of-the-art neural architecture. The system has also been adapted and applied to the recognition of elderly speech, which supports the research on healthcare for older adults. Second, we have developed an interactive dialog system which supports knowledge-grounded question answering — users with domain-specific questions can retrieve information from a document corpus and obtain automatically generated answers with relevant and precise information. This question-answering system competed in the international challenged, MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents, held at ACL 2022. Our system ranked first in the automatic evaluation leaderboards, as well as first in overall performance which includes human evaluation. Third, to support machine speech production, we are developing Cantonese text-to-speech (TTS) synthesis technologies. We also demonstrated the possibility of applying synthesis technologies to reconstruct dysarthric speech. Dysarthic speech is laboured and has low intelligibilty, caused by neuromotor control problems that affect artciulation. We have developed a speech reconstruction system that can transform dysarthic speech into normal-sounding speech, to enhance communication between the speech-impaired individuals with people around them. This reconstruction system was awarded the champion at the SciTech Challenge 2021 (open category). Last, but not least, we have compiled a sizeable Cantonese pronunciation lexicon, which supports the development of the ASR and TTS systems.