The important role that listening behavior plays in the conversation, such as head-nod, vocalbackchannel (um-huh, ok), etc., has been proved by previous works. These years more and more embodied conversation agents (ECAs) have implemented listening behavior to augment the process of human-computer interaction.
For the purpose of listening behavior generation, multimodal information is required, including acoustic and visual signals, to help virtual agents to decide when to give feedback and what content of the feedback should be. But few has taken the rapport level or conversational strategies into account.
I designed and implemented a new real-time algorithm for the listening behavior generation of SARA, which leveraged traditional multimodal features, together with rapport scores and conversational strategies. SARA was demoed on World Economic Summer Forum 2016, SIGDIAL 2017 and World Economic Forum 2017.