Spatiotemportal and multilingual Semantic Machine Learning Analysis of Social Media Data for the recent protests in Europe – based on Twitter data –
Abstract
The perception of inherent tensions between justice and injustice (or the
disproportion of good and bad) often press a group of people (or even the whole
society) to seek change concerning politics and power, for example in the form of
protests. In the last few decades, the advent and rapid expansion of internet-based
communication technologies transformed the way of seeking change through
connective action, where besides the two main elements—the people and their
intentions—the role of the information along with its spread and accessibility gained
more and more significance. Social media platforms such as Twitter are considered a
new mediator of collective action, in which various forms of civil movements unite
around public posts, often using a common hashtag, thereby strengthening the
movements.
The data-driven analitical approach, relying on social media posts and
activities, has many strengths—especially considering its high temporal resolution
and rapid user-response to certain news and information. Twitter data serves as a
unique and useful source of information for the analysis of civil movements, as the
analysis can reveal important patterns in terms of spatiotemporal and sentimental
aspects, which may also help to understand protest escalation over space and time.
The investigation of social media in the case of events such as the murder or a
protests in Belarus seems an important tool to track and understand the immediate
reaction of people, unlike any other method or source of information.
The methodological workflow developed in this doctoral research combines
time series clustering with semantic topic modeling and sentiment analysis,
performed on georeferenced social media data, which provides multi-modal insights
into the public’s reactions to a specific political event. The proposed approach
includes multi-lingual corpus translation, as well as location and sentiment
extraction, using machine-learning topic modelling methods to reveal the hidden
interests and motivators of collective action. Through this, the approach has a
distinct advantage over the prior investigations that primarily focused either on
hashtag-activism (ignoring the spatial dimensions) or, on the contrary, using only
location-specific hashtags. Whereas by applying machine learning algorithms and
techniques that are almost entirely automatable, the analysis can cover a much wider
range of input data than existing studies, where the researchers solely evaluate posts
manually. Overall, with this mixed-method approach, this work overcome the
limitations of contemporary research on social movements that mainly focuse one
one language and a restricted area.
The social media data analyzed in this dissertation were obtained using the
Twitter Streaming Application Programming Interface (API), the US-based social
networking and microblogging service. The first dataset’s starting date is adjusted to
the first official report of the murder of Ján Kuciak while the final day is adapted to
the earliest statement of the resignation of Prime Minister, Robert Fico (26 February
and 15 March 2018). The second dataset’s starting date is adjusted to the day of the
Belarus presidential election while the final day is adapted to the formally
inauguration of the president of Belarus, Alexander Lukashenko (9 August and 23
September 2020). Both datasets consists of the content of the tweets and additional
attributes such as user name, user location, and the timestamp when the tweet was
posted.