Speaker Orientation-Aware Privacy Control to Thwart Misactivation of Voice Assistants

Abstract

Smart home voice assistants (VAs) such as Amazon Echo and Google Home have become popular because of the con- venience they provide through voice commands. VAs continuously listen to detect the wake command and send the subsequent audio data to the manufacturer-owned cloud service for processing to identify actionable commands. However, research has shown that VAs are prone to replay attack and accidental activations when the wake words are spoken in the background (either by a human or played through a mechanical speaker). Existing privacy controls are not effective in preventing such misactivations. This raises privacy and security concerns for the users as their conversations can be recorded and relayed to the cloud without their knowledge. Recent studies have shown that the visual gaze plays an important role when interacting with conservation agents such as VAs, and users tend to turn their heads or body toward the VA when invoking it. In this paper, we propose a device-free, non- obtrusive acoustic sensing system called HeadTalk to thwart the misactivation of VAs. The proposed system leverages the user’s head direction information and verifies that a human generates the sound to minimize accidental activations. Our extensive evaluation shows that HeadTalk can accurately infer a speaker’s head orientation with an average accuracy of 96.14% and distinguish human voice from a mechanical speaker with an equal error rate of 2.58%. We also conduct a user interaction study to assess how users perceive our proposed approach compared to existing privacy controls. Our results suggest that HeadTalk can not only enhance the security and privacy controls for VAs but do so in a usable way without requiring any additional hardware

Publication
In Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network