Speaker Detection via Interaural Difference with Earables

Large data streams of audio data facilitate research on human everyday behavior. A major challenge for continuous audio recording is the data protection concern for third persons who do not agree to be recorded.

In this thesis, we will develop a privacy preserving speaker recognition system based on the interaural time difference (ITD) or interaural level difference (ILD). ITD and ILD describe the difference between multiple audio signals that is arrival time at ITD and amplitude at ILD. It is also a psychological-perceptive mechanism for locating sound sources.

A prior Bachelor Thesis at TECO used ITD for estimating the users head position and found the best performance with audio signals in the human voice range. Prior work combined Interaural Difference Measures with voice activity detection for privacy preserving voice commands [1]. Further Optimization would be possible based on Audio-based Activity Recognition [2, 3]. These findings pave the way for examining the feasibility of ITD for speaker recognition.

The goal is to separate the users voice signal from other audio sources–especially third-party speakers. Off-the-shelf devices with built-in air-conduction mics (eSense, Rode Lavalier) will be used. Further challenges involve:

  • Finding the best sensor locations for two or more microphones in a wearable setting.
  • Optimizing the existing prototype, such as calculation of the wave angle, establishing physical constrains, voice activity recognition (e.g., Cone of Confusion).
  • Examine the robustness of the device with environmental noise or movement.

Keywords: Earables, Audio Data Processing; Speaker Recognition; Machine Learning

Task

Literature review (e.g., Audio Signal Processing, Noise Cancelling);
Implementing the Wearable setup for multiple microphones;
Implement an algorithm that integrates diverse sensor data for speaker recognition;
Pilot study involving performance, usability, robustness, etc.;

What we offer

Access to a large pool of participants;
Professional advice in terms of Data Science and Hardware;
A pleasant working atmosphere and constructive cooperation;
Chances to publish your work on top conference;
Research at the intersection between Psychology and Technology;

Qualification

Proactive and communicative work style (frequent updates, prepared meetings and ideas);
Good English reading and writing;
Interest in working with Earable devices and interdisciplinary work;
Interested? Please contact: Tim Schneegans (schneegans@teco.edu)

References

[1] Yan, Y., Yu, C., Shi, Y., & Xie, M. (2019, October). PrivateTalk: Activating Voice Input with Hand-On-Mouth Gesture Detected by Bluetooth Earphones. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (pp. 1013-1020).

[2] Laput u. a. “Ubicoustics: Plug-and-Play Acoustic Activity Recognition\. In: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. Berlin Germany: ACM, Okt. 2018, S. 213{224. isbn: 978-1-4503-5948-1.

[3] Johannes A. Stork u. a. “Audio-based human activity recognition using Non- Markovian Ensemble Voting\. In: 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication. ISSN: 1944-9437. Sep. 2012, S. 509-514.