How to teach empathy to a speech synthesizer? On the possibility of identifying emotion solely from written Estonian sentences Cover Image

Kuidas õpetada kõnesüntesaatorile empaatiat? Emotsiooni automaatse tuvastuse võimalustest eestikeelses kirjalikus lauses sisalduva info põhjal
How to teach empathy to a speech synthesizer? On the possibility of identifying emotion solely from written Estonian sentences

Author(s): Ene Vainik
Subject(s): Language and Literature Studies
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: emotions; written text; speech synthesis; sentiment analysis; affective computing; Estonian

Summary/Abstract: There is a growing need for more naturalness in synthetic Estonian speech. One of the measures to be taken is to teach the computer to apply differentiated acoustic registers according to the different emotions (joy, sadness, anger) or neutrality of the text. The present article deals with how to detect the quality of emotion vs. neutrality relying solely on the information present in the Estonian written text. The first part of the article provides a theoretical overview of how, in principle, emotions can be expressed in speech. Besides the ideational/referentional mode (i.e. the use of a literal emotion term, like sadness) there is a variety of linguistic cues of expressiveness to be detected. As a result of an overview of the literature on sentiment analysis and affective computing it is stated that the approaches vary in many respects. First, there is a variation in what is considered to be the unit of analysis in the first place (text, passage, sentence, or clause). Secondly, what exactly is being looked for: particular emotions in terms of specific categories (e.g. fear, joy, sadness, anger etc.) or more abstract dimensions (e.g. valence, intensity). The third aspect to be noticed is whether the authorial or also non-authorial affect is taken into account. Most of the approaches exploit lexical features and compare the items present in the text with an affective lexicon. In the empirical part of the study the results of a statistical analysis of Estonian sentences (altogether 361) are presented. The sentences were retrieved from the Estonian Emotional Speech Corpus, where 55 test subjects had evaluated their emotion (joy, sadness, anger) or neutrality. To put the results very briefly is to say that identifiying the emotion of a sentence entirely out of context must have been a pretty demanding task. There was not much congruence in the evaluations. Looking for the features which could work for the computer as cues for automatic emotion detection ended up in a list of probabilistic tendencies. Features such as punctuation, length of sentence (in characters), part of speech of the first word, negation etc. were used. Also the main lexical means and some strategies of attributing emotion according to the evaluative value of the sentence were described in some detail. The article ends with a conclusion that enabling a speech synthesizer to imitate human emotions can be compared with modeling human empathy. This is not an easy task. None of the systems created for other languages can be simply adapted for Estonian. This is firstly because emotions and their expressions are culturespecific to some extent and, secondly, because the Emotion Detector should rely on an Estonian affect dictionary, which does not exist, yet. Consequently, there is still a lot to do in the field.

  • Issue Year: 2010
  • Issue No: 6
  • Page Range: 327-347
  • Page Count: 20
  • Language: Estonian