Named Entity Annotation in the National Corpus of Polish Cover Image

Named Entity Annotation in the National Corpus of Polish
Named Entity Annotation in the National Corpus of Polish

Author(s): Marta Chojnacka-Kuraś, Paweł Śliwiński, Anna Wesołek
Subject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: nazwy własne; jednostki nazewnicze; anotacja; proper names; named enity; annotation

Summary/Abstract: The paper discusses the main principles and methodology of named entity annotation, as well as provides examples of description adapted in the National Corpus of Polish. The Authors present the scope of annotation, division of the analyzed lexical items into categories and subcategories (personal names and their subtypes, geographical names and their subtypes, names of organizations and institutions, geopolitical names, temporal expressions: date and time), as well as the principles for determining attributes for the particular categories. The subsequent sections elaborate on selected issues that were deemed by the Authors the most engaging and the most interesting. These involve among others: the problem of embedded names (e.g. {ul. gen. de Gaulle’a} ‘Gen. de Gaulle Street’) and coordinated personal names (e.g. {Anna i Jan Dąbrowscy} ‘Anna and Jan Dąbrowscy’), the problem of metonymy and contextually conditioned adjustment of the analyzed name to a particular category (e.g. Europa ‘Europe’ as a geographical or geopolitical name, {Unia Europejska} ‘the EU’ as a bloc of united countries or as a geopolitical organization), difficulties in establishing the bases for derived forms (relational adjectives, e.g. {amerykański} ‘American’, derived either from the name of the continent or from the name of the country – the USA). The last section deals with arbitrarily made decisions in the case of an ambiguous derivation base, eventually the cases in which the name has not been assigned to any base (e.g. names of nation’s members such as {Arab} ‘citizen of any Middle Eastern or North Afirican country’, {Żyd} ‘Jew / Israeli’). The Authors account for extralinguistic factors that make it difficult to unequivocally annotate the names of some geopolitical units, e.g. the ambiguous status of Kosovo, viewed either as a region of Serbia or as an independent state).

  • Issue Year: 2012
  • Issue No: 63
  • Page Range: 085-098
  • Page Count: 14
  • Language: English