Туитовете като предизвикателство пред автоматичната лингвистична обработка
The paper focuses on the specificities of the written colloquial speech in tweets as a challenge for the automatic linguistic analysis. Such an analysis includes: text segmentation into words; morphological analysis in parts-of-speech and related grammatical characteristics; dependency syntactic analysis; named entity recognition of people, locations and organizations; handling abbreviations. The problems are of the following kinds: out-of-vocabulary words; word blending; colloquial variants that have not been normalized, etc. The survey explores 630 tweets that discuss the crisis of two banks in Bulgaria in 2014.
More...