Who really is a data scientist? Analysis of requirements for data centred roles job market and their future Cover Image

Who really is a data scientist? Analysis of requirements for data centred roles job market and their future
Who really is a data scientist? Analysis of requirements for data centred roles job market and their future

Author(s): Piotr Kałużny, Klaudia Karpińska, Łukasz Krawiec
Subject(s): Economy, Supranational / Global Economy, Business Economy / Management, Socio-Economic Research
Published by: Wydawnictwo Uniwersytetu Ekonomicznego w Poznaniu
Keywords: data scientist;data engineer;data analyst;job offers;job postings;big data;data analysis;job market;data mining;natural language processing;text mining;education;
Summary/Abstract: Data analysis and processing skills are currently required by a multitude of job offers and cover a wide variety of applications. Although mostly shaped by the development of new technologies, programming languages and libraries, they are a necessity in the world of digital economy and entrepreneurship. A multitude of reports by large consulting companies such as Deloitte predict a sharp increase in demand for data science and AI roles in the future of not only the IT sector, but also the entire economy. The following questions arise: “What skillset do these innovators that use artificial intelligence and advanced analytical skills have?” and “What skills and requirements truly make a data scientist and are they are any different to that of data analysts, data engineers or software developers and programmers?”, moreover, “What is the demand for these specialists and are the university programs educating future specialists in this field or are the skills too new and need to be taught solely by business practice?” . To answer these questions, this article applies Natural Language Processing (NLP) techniques of machine learning to characterize and extract from the offers key skills important for data centred roles. The research was carried out on a preprocessed sample of 72 thous and job offers from the IT sector posted in 2019. A SVMlinear classifier was applied to extract the most distinguishing technical skills and characterize the possibility of the automated classification of job postings, which resulted in about 85% precision and recall values for classifying data analyst, data scientist and data engineer roles and about 90% for classifying python developer roles.

  • Page Range: 107-138
  • Page Count: 32
  • Publication Year: 2022
  • Language: English