Identifying age groups of Twitter users based on the specific characteristics of textposts
Identifying age groups of Twitter users based on the specific characteristics of textposts
Author(s): Krzysztof Najman, Kamila Migdał-Najman, Katarzyna Raca, Agata MajkowskaSubject(s): Media studies, Methodology and research technology
Published by: Główny Urząd Statystyczny
Keywords: Twitter; text mining; user age;
Summary/Abstract: Textual data (textposts) account for a significant portion of all data posted on the Internet. One piece of information that researchers are seeking to obtain about the authors of textposts is their age, which is not always made public, yet important from the point of view of marketing, social and economic research. Language research shows that representatives of different age groups tend to use a distinct set of vocabulary and grammatical forms. Presumably, textpost formatting as well as the level of the correctness of the text itself may also differentiate user age groups. The aim of the research presented in this article is to use the elements typically eliminated from texts during text mining processes, such as emoticons, punctuation marks and words that are not content carriers (stopwords) to distinguish the age groups of the authors of Twitter (currently X) posts. The study analysed nearly 3 million tweets in English posted before July 2020. The research shows that distinguished textpost elements differentiate the age groups only to a small extent. The youngest users stood out the most due to their specific language characteristics in textposts.
Journal: Wiadomości Statystyczne. The Polish Statistician
- Issue Year: 69/2024
- Issue No: 10
- Page Range: 59-74
- Page Count: 16
- Language: English