Discussion
Our findings suggest that Twitter is a useful tool to capture key diabetes-related topics and the emotions associated with those topics. Our key findings suggest that there is a lot of support and solidarity among the diabetes online community with numerous tweets related to joy and love emotions observed. In contrast, Twitter users expressed fear, anger and sadness related to insulin pricing and diabetes-related complications and comorbidities, as well as considerable frustration about the inability of people to distinguish between type 1 and type 2 diabetes.
To our knowledge, this is the first study using social media data to capture information regarding key diabetes-related concerns. We are, therefore, unable to compare and contrast the results of this study with others. Nonetheless, the importance of understanding emotions and self-control (regulation of thoughts, emotions and behavior) for health outcomes in people with diabetes has been previously documented in several other studies.27 Hagger et al reviewed diabetes distress among adolescents with type 1 diabetes and found a substantial proportion experienced elevated diabetes distress and that it is often associated with suboptimal glycemic control.28 Richman et al also suggested that positive emotions such as hope and curiosity may play a protective role in the development of disease.29 Ogbera et al showed that higher levels of emotional distress are associated with poor self-care in type 2 diabetes.30 Another study conducted by Iturralde et al showed that anxiety is highly comorbid with depression among individuals with type 2 diabetes.4 Our study aligns with these results showing that emotions and diabetes distress topics are frequent and concern people with diabetes or people talking about diabetes on social media.
Similar to Nguyen et al, who showed that individuals living in zip codes with high percentages of happy and physically active tweets had lower obesity prevalence based on geolocated Twitter data, we have shown gradients between topics related to diabetes on Twitter and the household income level of their city.31
Insulin pricing
We found that insulin pricing was a major concern among tweets shared in the USA (18% of all tweets were related to insulin pricing) and is associated with both positive (joy, love) and negative (sadness, anger, fear) emotions. People frequently shared their frustration with insulin prices, access to insulin and identifying sources of insulin including ‘glucose guardians’ or donations, which represent major obstacles for people with diabetes.32,33 Positive emotions are present when it comes to solidarity in the fight for affordable insulin in the community. We observed associations of topics addressing insulin pricing to be more frequent in cities with high mean incomes. This does not necessarily indicate that people living in cities with a high mean household income feel more concerned about insulin prices, but rather they probably have a greater ability to tweet around this issue. A large number of tweets geolocated in cities with a high mean household income included the hashtag ‘#insulin4all’, a campaign that unites the diabetes community around the access to treatment for everyone.34
It is known that there are key challenges for a global and fair access to insulin.35 With respect to insulin pricing, we are the first to exhibit and quantify, on a large sample of people with or talking about diabetes, the extent of the crisis in the USA based on social media data. In addition, we have also been able to highlight the different emotions and fears associated with the crisis around insulin pricing.
Strengths and limitations
This study has numerous strengths. First, a major advantage of using social media data is that information is expressed spontaneously, on a large scale, and in real-time, in what can be considered as an open digital space with flat role hierarchy for information sharing and development of online communities. This potentially minimizes biases such as responder bias that you would observe in traditional and observational studies. We evaluated tweets related to diabetes from a large number of people with a large variability in their profiles. The methodologies developed in this study present an innovative way to concentrate on relevant (personal, emotional) geolocated tweets (USA), to identify topics of interest and emotions shared within topics. This approach is able to capture trends in the online diabetes community as well as socioeconomic factors that can be associated with social media data at the ecological level. This new way of capturing data supplements the detection of topics which are less medically oriented.
There are, however, several limitations to consider. First, diabetes-related concerns expressed on Twitter may not be representative of all people with diabetes. However, it has been previously suggested that it can be partially offset by the large variability in the social media profiles, a key strength in digital epidemiology.36 While we did observe large variability in our Twitter profiles, we found an over-representation of people with type 1 diabetes and women in our study when compared with known diabetes epidemiology literature.37 The greater representation of type 1 diabetes may be explained by the younger demographics of Twitter users.38 Alternatively, type 1 diabetes may have more involved care, more devices, more challenging medication and more frustrations to report on Twitter as compared with type 2 diabetes. Regardless, our results should be interpreted in the context of the Twitter population only. Second, the precision of our filter classifiers (personal content, jokes), gender and type of diabetes classifiers, is not perfectly accurate, which means that we cannot guarantee that 100% of tweets are posted by actual people with diabetes and it was often impossible to define the sex or type of diabetes. Third, we were unable to account for several clinical and environmental factors that may help to tease out these associations. The label provided by the researchers for each topic is not exclusive. By refining the tweets in each topic, more subtopics could be defined. This could be a future direction to investigate. Fourth, the geolocation of tweets was partially based on locations the users provided, which might not be their true location. Fifth, emotion detection is still a challenge in the machine learning field due to the occurrence of sarcasm and irony. It is one of the open research questions. Last, causal inference between the mean household income per city and the topics of interest of people residing in the corresponding city cannot be made as it is subject to ecological fallacy.
A perspective of our work is to extend our analyses to include more countries and languages.