Publications

Audio wearable devices, or hearables, are becoming an increasingly popular consumer product. Some of these hearables contain an in-ear microphone to capture audio signals inside the user’s occluded earcanal. Mainly, the microphone is used to pick up speech in noisy environments, but it can also capture other signals, such as nonverbal events that could be used to interact with the device or a computer. Teeth or tongue clicking could be used to interact with a device in a discreet manner, and coughing or throatclearing sounds could be used to monitor the health of a user. In this paper, 10 human produced nonverbal audio events are detected and classified in real-time with a classifier using the Bag-of-Audio-Words algorithm. To build this algorithm, different clustering and classification methods are compared. MelFrequency Cepstral Coefficient features are used alongside Auditory-inspired Amplitude Modulation features and Per-Channel Energy Normalization features. To combine the different features, concatenation performance at the input level and at the histogram level is compared. The real-time detector is built using the detection by classification technique, classifying on a 400 ms window with 75% overlap. The detector is tested in a controlled noisy environment on 10 subjects. The classifier had a sensitivity of 81.5% while the detector using the same classifier had a sensitivity of 69.9% in a quiet environment.

Link

To enhance the communication experience of workers equipped with hearing protection devices and radio communication in noisy environments, alternative methods of speech capture have been utilized. One such approach uses speech captured by a microphone in an occluded ear canal. Although high in signal-to-noise ratio, bone and tissue conducted speech has a limited bandwidth with a high frequency roll-off at 2 kHz. In this paper, the potential of using various bandwidth extension techniques is investigated by studying the mutual information between the signals of three uniquely placed microphones: inside an occluded ear, outside the ear and in front of the mouth. Using a Gaussian mixture model approach, the mutual information of the low and high-band frequency ranges of the three microphone signals at varied levels of signal-tonoise ratio is measured. Results show that a speech signal with extended bandwidth and high signal-to-noise ratio may be achieved using the available microphone signals.

Link

The accurate classification of nonverbal human produced audio events opens the door to numerous applications beyond health monitoring. Voluntary events, such as tongue clicking and teeth chattering, may lead to a novel way of silent interface command. Involuntary events, such as coughing and clearing the throat, may advance the current state-of-the-art in hearing health research. The challenge of such applications is the balance between the processing capabilities of a small intra-aural device and the accuracy of classification. In this pilot study, 10 nonverbal audio events are captured inside the ear canal blocked by an intra-aural device. The performance of three classifiers is investigated: Gaussian Mixture Model (GMM), Support Vector Machine and Multi-Layer Perceptron. Each classifier is trained using three different feature vector structures constructed using the mel-frequency cepstral (MFCC) coefficients and their derivatives. Fusion of the MFCCs with the auditoryinspired amplitude modulation features (AAMF) is also investigated. Classification is compared between binaural and monaural training sets as well as for noisy and clean conditions. The highest accuracy is achieved at 75.45% using the GMM classifier with the binaural MFCC+AAMF clean training set. Accuracy of 73.47% is achieved by training and testing the classifier with the binaural clean and noisy dataset.

Link

Objective: Speech production in noise with varying talker-to-listener distance has been well studied for the open ear condition. However, occluding the ear canal can affect the auditory feedback and cause deviations from the models presented for the open-ear condition. Communication is a main concern for people wearing hearing protection devices (HPD). Although practical, radio communication is cumbersome, as it does not distinguish designated receivers. A smarter radio communication protocol must be developed to alleviate this problem. Thus, it is necessary to model speech production in noise while wearing HPDs. Such a model opens the door to radio communication systems that distinguish receivers and offer more efficient communication between persons wearing HPDs. Design: This paper presents the results of a pilot study aimed to investigate the effects of occluding the ear on changes in voice level and fundamental frequency in noise and with varying talker-to-listener distance. Study sample: Twelve participants with a mean age of 28 participated in this study. Results: Compared to existing data, results show a trend similar to the open ear condition with the exception of the occluded quiet condition. Conclusions: This implies that a model can be developed to better understand speech production for the occluded ear.

Link