Analyzing public opinions on political affairs never fails to attract researchers’ attention. One popular application is analyzing ideologies of the ordinary citizens. Traditionally, researchers collect public opinions by conducting surveys, interviews, or estimate via the roll call data. It took a lot of time to come to a conclusion, and it was also hard to find representative objects and keep the objects’ opinions unaffected by researchers who design the survey questions. By using social media data, some of the previously-mentioned problems are mitigated due to the unprecedented massive scale. However, we need to keep in mind that social media data might be systematically biased in some other ways. For instance, those who never use social media could not be included.
In recent years, social media, such as Twitter, plays an increasingly important role in people’s life. People express opinions, digest information, and interact on social medias. All these behaviors left massive amount of observable clues online, which we collect as our data. Centered around the problems on political ideology analysis, we start from collecting a list of politicians who have verified accounts on Twitter. Then we build our data sets from the Twitter accounts who are not further than one hop away from the politicians, directly following or are followed by the politicians. Although it becomes much easier to collect massive amount of data in an efficient and timely manner, social media data bring us unique challenges. For example: (1) we need to deal with the data size which is typically large; (2) there are seldom ground-truth knowledge on social media data, which leads to the lack of labels; (3) we need to consider how to deal with the multi-modality nature of our data.
If we view each user account as a node, their interactive behaviors could be modeled as links in between. Since they interact in multiple ways, the graph structure we form is heterogeneous. If we view each account as an individual information source, we could represent its feature by the collection of tweets it posted in the past. If we consider temporal information as well, the whole system could also be regarded as a multi-agent dynamic system. Based on the observation of the data, we have proposed the following research problems to answer:
(1) Can we rely solely on the user behavior data, represented as heterogeneous types of links in the graph structure, to reveal the users’ ideologies?
(2) Can we rely solely on the text information from tweets, to reveal the users’ political polarities?
(3) By learning from the historical data on social media, containing text, link, and temporal information, can we predict the future trend?
To answer the first research problem, we proposed a model that successfully uses the heterogeneous types of relations, called TIMME (Twitter Ideology-detection via Multitask Multi-relational Embedding). Challenges come from (1) the extreme sparsity of the labels, (2) the incompleteness of the input features, and (3) the heterogeneous types of links. TIMME is overall better than the state-of-the-art models for ideology detection on Twitter. In theory, it could be extended to other data sets, and could be apply to tasks other than ideology detection.
The work we’ve done to answer the second problem resulted in an embedding approach called PEM (i.e., Polarity-Aware Embeddings Using Multi-Task Learning). The ideological divisions in the United States have become increasingly prominent in daily communication, and a lot of research has been conducted. We propose to quantify political polarity in social-media text data using a polarity-aware method of learning word representations. By learning a word embedding with an explicit polarity dimension, one can infer the polarity of a post and therefore of the social-media account that produced it. Decomposing a traditional embedding into a polarity-neutral semantic component and a polarity-aware component is a major challenge. Very sparse labels is another key challenge. Our experimental results demonstrate that our model can successfully learn high-quality polarity-aware embeddings.
The third research question is answered by our next project, a social dynamic system model that captures the updating patterns in real-world social network data sets. We faced challenges on both the data end and the model end. From the data set, there is no existing publicly-available data sets on real-world social network observations. From the model perspective, modeling continuous time is trickier than modeling discrete time by nature. After decided that we should consider using following the Neural-ODE framework, other challenges came about, such as the data size, and the selection of an appropriate decoder task. Our experimental results show that the framework we propose is capable of learning the dynamic changes in the social network data sets. And our framework could also be used to verify how well an opinion dynamic model captures the changes of a specific real-world data set, under a given task.