Master Thesis: Scalable Silent Speech Using Phoneme Recognition

Background

Earables are earbuds equipped with multiple sensors, enabling advanced interaction and sensing capabilities. Among them, OpenEarable 2.0 (Röddiger et al., 2024) stands out as the device with the most sensors globally, enabling a wide range of interaction possibilities. Prior research has demonstrated that earables can detect various physiological and behavioral signals (Röddiger et al., 2022; Hummel et al, 2025; Hu et al., 2025), including silent speech – the articulation of speech without producing audible sound. This capability has promising applications in privacy-preserving interaction, accessibility, and communication in noisy environments.

To date, the scientific literature on earable-based silent speech systems primarily reports solutions that utilize either inertial measurement units (IMUs) (e.g., Srivastava et al., 2022; Srivastava et al., 2024) or combinations of speaker and microphone systems (e.g., Jin et al., 2022, Dong et al., 2024). While IMU-based systems typically require custom hardware extending to the temporomandibular joint (TMJ), speaker-microphone approaches have been demonstrated using standard earbud form factors, such as those found in OpenEarable 2.0.

Of particular relevance is the work by Srivastava et al. (2022), who moved beyond fixed word dictionaries by successfully predicting phonemes, enabling open-vocabulary word recognition. However, their system also relied on a custom device reaching the TMJ. The goal of this thesis is to adapt this phoneme-level prediction strategy to use ultrasound reflections captured by a speaker-microphone pair embedded within the OpenEarable 2.0 as well as in every standard earphone. This modality offers a compelling alternative to IMU-based systems, with greater potential for integration into scalable, commercially viable hardware.

The feasibility of predicting letter-level speech units using a speaker-microphone combination has already been demonstrated by Dong et al., (2024) and Sun et al. (2024), with the latter even showcasing the general detectability of vowels. Given anatomical variability in ear canal geometry, it is to be anticipated that a generalized model combined with lightweight per-user fine-tuning will be necessary.

Tasks

Review relevant literature on silent speech and phoneme recognition, with a focus on earable-based systems. Design a data collection protocol that captures the necessary variety in speech for reliable phoneme-level modeling.
Research and select an appropriate acoustic stimulus to be played from OpenEarable 2.0 speakers for ultrasound-based sensing. Try it out whether the appraoch is directly transferable to the OpenEarable device. Implement the stimulus playback functionality on OpenEarable 2.0.
Design and conduct a user study to collect both silent and vocalized speech data. Ensure high-quality recordings suitable for training and evaluating your phoneme detection model.
Develop a phoneme recognition pipeline based on the collected data. Implement phoneme-to-word reconstruction and evaluate model performance on both seen and unseen vocabulary.
Bonus: Implement a real-time silent speech recognition system directly on the OpenEarable 2.0.

Requirements

Interest in Human-Computer-Interaction (HCI) and the real-world application of new devices
Good Python skills (for data analysis)
Optional
- Good C/C++ skills, SapphireOS (for programming OpenEarable 2.0)
- Flutter (for adapting the OpenWearables App)

If you are interested in this topic, please contact Jonas Hummel (hummel@teco.edu).