Silence Is Golden - Making Pauses in Human Utterances Comprehensible for Social Robots in Human-Robot Interaction
Thomas Sievers
Abstract
People pause when speaking for a variety of reasons – often in the middle of a sentence. It’s not easy for a machine to tell the difference between a pause for thought and an intended turn, but a smooth turn-taking is essential for flawless communication. Pauses within a sentence reveal something about the current emotional state of the speaker, and a correct interpretation of emotions is crucial for the mutual understanding of actors in human-robot interaction (HRI). How can we assess what the pauses a person makes in a dialogue with a social robot tell us about their emotional state? The text-to-speech tool Whisper from OpenAI enables robust speech recognition across different languages and the measurement of pauses between words. These pauses can be used to improve the assessment of the speaker’s emotional state by evaluating human utterances, including speech pauses, by a Large Language Model (LLM) from OpenAI (ChatGPT) using sentiment analysis. The inclusion of pauses as a non-verbal cue provides a helpful component for such an analysis.