Ai Feedback
Exact(5)
Therefore, our exemplar dictionary can be created by selecting individual frames of the modulation spectrum in a database of labelled speech.
In this paper we presented a novel noise-robust ASR system that uses the modulation spectrum in combination with a sparse coding approach for estimating state probabilities.
Subsequent developments towards harnessing the modulation spectrum in ASR have followed pretty much the same path, characterized by some form of additional processing applied to sequences of short-time spectral (or cepstral) features.
Still, most approaches in speech technology, whether it is speech recognition, speech synthesis, speaker recognition, or speech coding, seem to rely on impoverished representations of the modulation spectrum in the form of a sequence of short-time spectra, possibly extended with explicit information about the dynamic changes in the form of delta and delta-delta coefficients.
We believe that this discrepancy is related to the difference between sparse coding (reconstruction of an observed modulation spectrum in terms of a linear combination of exemplars), what it is that the Lasso solver does, and sparse classification (estimating the probability of the HMM state underlying the observed modulation spectrum), which is our final goal.
Similar(55)
For example, in [31], a content-based speech discrimination algorithm is designed to exploit the long-term information inherent in the modulation spectrum; and in [32], authors propose two segment-based features: the variance of the spectrum flux (VSF) and the variance of the zero crossing rate (VZCR).
In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities.
In order to use information in modulation spectrum (at important frequencies starting from low range), a signal over relatively long-time scales has to be processed.
band spectrum 0 1, 0 5, and 1 5 kHz Alpha ratio (in dB), Hammarberg index (in dB) Others Equivalent sound level (in dB) Frequency with maximum amplitude in modulation spectrum of F 0 and loudness.
These cases will surely require additional processing, for example, aimed at tracking continuity in pitch, in addition to continuity in the modulation spectrum.
In addition, using the modulation spectrum increases the reconstruction errors in synthesized speech.
Write better and faster with AI suggestions while staying true to your unique style.
Since I tried Ludwig back in 2017, I have been constantly using it in both editing and translation. Ever since, I suggest it to my translators at ProSciEditing.

Justyna Jupowicz-Kozak
CEO of Professional Science Editing for Scientists @ prosciediting.com