Analysis of quantity degrees by synthesis Cover Image

Väldete analüüs sünteesi teel
Analysis of quantity degrees by synthesis

Author(s): Meelis Mihkla
Subject(s): Phonetics / Phonology, Computational linguistics, Methodology and research technology
Published by: SA Kultuurileht
Keywords: quantity degrees; analysis by synthesis; speech synthesis; machine learning;

Summary/Abstract: The article is the first attempt to provide a fresh view of the quantity degree as a phono­logical category by using synthesis and machine learning. The results of the analysis are compared with recorded natural speech and with synthetic speech achieved by different methods of synthesis. Self-training speech synthesizers based on hidden Markov models (HMM), on deep neural netrworks (DNN) and on recurrent neural networks (RNN) were used. The input for machine learning consisted of written Estonian texts (ca 1000 sentences) and audio texts recorded from different informants who used the written texts as a basis. The self-training TTS systems did not contain any Estonian-based module or analyser. To check the precision of pronunciation, the output speech of each synthesizer was subjected to a perception test. According to the results obtained, the general precision of pronunciation reached up to 80.6% for synthetic speech, vs. 94.4% for natural speech. Another evaluative mechanism used involved an analysis of nine acoustic parameters. The analytic comparison of synthetic and natural speech confirmed that the main acoustic cue to differentiate between quantity degrees is the durational ratio between syllable onsets. As for additional characteristics, it was the position of the peak of the pitch contour in the stressed syllable and the difference measured between the pitch maxima that proved to be significant parameters both in synthetic and natural speech. Synthetic speech displayed less intra-degree variability than natural speech, especially for the difference between pitch maxima and between the duration ratios of syllable onsets. While the different methods used for synthesizing quantity degrees ended up in relatively small differences in pronunciation precision (77.8-81.5%), involvement of relevant acoustic parameters brought about a considerable rise in variability. This indicates that although the duration ratio of syllable onsets will remain the main characteristic feature of a quantity degree, the algorithms of machine learning enjoy a relative freedom of choice among additional characteristics to achieve a good enough audio result. The prevalent cause of error in the pronunciation of words with different quantity degrees was the choice of a wrong duration ratio between syllable onsets, which applied to all three quantity degrees. The rest of error-causing parameters only worked for single quantity degrees. In conclusion, although the present study did not reveal any new aspects of Estonian quantity, it serves well as a pilot study indicating that analysis by synthesis is quite a considerable and promising method to test various phonological categories and phonological representation of speech.

  • Issue Year: LXIII/2020
  • Issue No: 11
  • Page Range: 935-950
  • Page Count: 16
  • Language: Estonian