Article Text

Global diabetes burden: analysis of regional differences to improve diabetes care
  1. Charline Bour1,2,
  2. Adrian Ahne3,
  3. Gloria Aguayo1,
  4. Aurélie Fischer1,
  5. David Marcic4,
  6. Philippe Kayser4,
  7. Guy Fagherazzi1
  1. 1Department of Precision Health, Deep Digital Phenotyping Research Unit, Luxembourg Institute of Health, Strassen, Luxembourg
  2. 2Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  3. 3Center for Research in Epidemiology and Population Health (CESP), INSERM, Villejuif (Paris), Île-de-France, France
  4. 4Department of Precision Health, Data Integration and Analysis Unit, Luxembourg Institute of Health, Strassen, Luxembourg
  1. Correspondence to Dr Guy Fagherazzi; Guy.Fagherazzi{at}


Introduction The current evaluation processes of the burden of diabetes are incomplete and subject to bias. This study aimed to identify regional differences in the diabetes burden on a universal level from the perspective of people with diabetes.

Research design and methods We developed a worldwide online diabetes observatory based on 34 million diabetes-related tweets from 172 countries covering 41 languages, spanning from 2017 to 2021. After translating all tweets to English, we used machine learning algorithms to remove institutional tweets and jokes, geolocate users, identify topics of interest and quantify associated sentiments and emotions across the seven World Bank regions.

Results We identified four topics of interest for people with diabetes (PWD) in the Middle East and North Africa and another 18 topics in North America. Topics related to glycemic control and food are shared among six regions of the world. These topics were mainly associated with sadness (35% and 39% on average compared with levels of sadness in other topics). We also revealed several region-specific concerns (eg, insulin pricing in North America or the burden of daily diabetes management in Europe and Central Asia).

Conclusions The needs and concerns of PWD vary significantly worldwide, and the burden of diabetes is perceived differently. Our results will support better integration of these regional differences into diabetes programs to improve patient-centric diabetes research and care, focused on the most relevant concerns to enhance personalized medicine and self-management of PWD.

  • patient-centered care
  • patient reported outcome measures
  • algorithms
  • population health

Data availability statement

Data are available on reasonable request. According to the Twitter API, tweets cannot be shared but tweets' IDs can be provided on request.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Twitter data can be a useful resource to monitor key concerns of people with diabetes, complementary to what can be achieved with questionnaires in clinical studies.


  • This study included a worldwide analysis of a dataset of 34 millions of tweets from 172 countries to detect the most important topics of interest of people with diabetes and to study their differences accross the seven World Bank regions.

  • We have identified universal topics of concern. The concerns related to glycemic control and food are common to seven and six regions of the world, respectively.

  • Other topics were found to be more important in some specific regions, such as insulin pricing in North America or the burden of daily diabetes management in Europe and Central Asia.


  • Our results can support the development of tailored diabetes programs at the regional level to focus on the most important concerns and thus to enhance personalized medicine and self-management of people with diabetes.


The term ‘burden of disease’ describes the overall consequences (loss of health, social aspects, costs to society, death) caused by diseases, injuries and risk factors worldwide and is often measured using quality-adjusted life years (QALYs) or disability-adjusted life years (DALYs).1–3 However, QALYs and DALYs prevent us from understanding the drivers of the diabetes burden, such as the role of diabetes distress or the quality of care. Diabetes distress defines the emotional distress linked to living with diabetes and day-to-day management but also worrying about complications.4 It has been shown that one in four people with type 1 diabetes and one in five people with type 2 diabetes have high levels of diabetes distress.5 Emotional distress is associated with diabetes self-management and glycemic control issues.6

Conceiving patient-centered instruments helped measure the quality of care for PWD. Many of these have additional subscales ortheir evaluation aspects overlap.6 These gaps in the assessment methods of the quality of care for PWD need to be identified. The most important factors must be prioritized and become objectives to address. As priorities for a person with diabetes in the USA may differ vastly between a PWD in Western Europe, the Middle East or South Asia, determining the regional objectives is necessary to improve the lives of PWD. It is crucial to understand the regional differences in how the diabetes burden is perceived to integrate them into future diabetes programs. These could then address the most relevant local factors of diabetes burden.

One international source of data that captures the viewpoint of people with diabetes is Twitter. With more than 130 million users in 2019, it proved to be compatible with health research in various ways, but mainly to collect a considerable volume of data for public health surveillance, early event detection, outbreak prediction and analysis of a population’s sentiments and emotions.7–12 Sentiment analysis aims to recognize polarity in texts (positivity, negativity or neutrality), while emotion analysis determines the emotional state of an individual (anger, fear). Several diabetes communities have developed on Twitter, where users can share their experiences, ask for advice or chat. They can be found with relevant hashtags (#dsma: Diabetes Social Media Advocacy, #gbdoc: UK Diabetes Online Community). It is thus possible to access large quantities of diabetes-related data from individuals and communities of PWD on Twitter. Social media data enables a better understanding of the principal daily concerns and associated emotions related to diabetes, diabetes management, diabetes distress or diabetes burden.13 More broadly, social media data may provide insights into how concerns differ between countries. As Twitter is embraced globally by numerous people and does not rely on predefined questions like evaluation scales, collecting and analyzing tweets can be considered an innovative and complementary way to understand PWD’s feelings and concerns about their diabetes. Ahne et al14 previously showed that such analysis could be efficient in identifying primary concerns in the USA.

Because precision health starts by contextualizing the needs of the patients, we have tested the hypothesis that it is feasible to use a reproducible approach to analyze online data to better understand the determinants of diabetes burden and to identify regional differences that will serve to design more patient-centered diabetes programs in the future.10

Research design and methods

Data collection

Tweets are public by default and can be collected using the Twitter Application Programming Interface, which provides access to 1% of all Twitter data in real time based on keywords. To collect diabetes-related tweets, we defined a list of 272 diabetes-related keywords such as diabetes, insulin and blood glucose in 30 different languages (online supplemental appendix 1). Overall, the collection includes 34 million tweets published between May 2017 and April 2021. The data collected for this study only includes publicly posted tweets.

Supplemental material


The first step consisted of deleting duplicates and retweets to keep unique tweets and quote retweets (a retweet with an added comment). Second, non-English tweets were translated into English. Third, two classifiers were applied to keep only tweets with personal, non-joke or non-ironic content from users sharing diabetes-related information about themselves or relatives. The workflow can be seen in figure 1.

Figure 1

Workflow showing the data preprocessing and analysis. Blue boxes correspond to steps where machine learning methods apply.


A tweet object provides meta-data, including information about the user account and location. The users provide their geographical area via an entry in their public profile. The precision in their description may vary. After applying the process described in online supplemental appendix 2, tweets were separated into the following seven regions: North America, East Asia and Pacific, Europe and Central Asia, Latin America and the Caribbean, the Middle East and North Africa, South Asia, and Sub-Saharan Africa. These regions comply with the ‘World Bank Country and Lending Groups’ classification from The World Bank Group.15

Supplemental material

Sentiment analysis

We used Valence Aware Dictionary for Sentiment Reasoning to assess whether there was a positive or negative sentiment within a tweet.16 The primary metric used for the sentiment analysis was the compound score (polarity), a unidimensional and normalized measure of sentiment between −1 and +1.

Topic extraction

We applied a k-means algorithm to the tweets in each region and gave each cluster a label according to the 20 closest tweets to the topic center and the most frequent words (top words) in the cluster.17 18

Emotion analysis

To determine the predominant emotion in each tweet, a classifier was developed based on texts focusing on four emotions: fear, anger, joy and sadness.19 We applied this classifier to all tweets to predict the probability of a tweet belonging to each of the four emotions.

Every algorithm used for this study is available on Github: More details about the methodology can be found in online supplemental appendix 2.

Role of the funding source

The content of this publication is solely the author’s responsibility and does not necessarily represent the official views of the funders.


Spatial distribution of diabetes-related tweets

After preprocessing, we included 820 615 geolocated tweets in this study. Tweets were distributed as follows: 568 020 from North America (n=69.2%, three countries included), 176 124 from Europe and Central Asia (n=21.5%, 49 countries included), 31 426 from East Asia and Pacific (n=3.8%, 27 countries included), 20 465 from Sub-Saharan Africa (n=2.5%, 36 countries included), 15 935 from South Asia (n=1.9%, eight countries included), 4554 from Latin America and the Caribbean (n=0.6%, 29 countries included) and 4091 from the Middle East and North Africa (n=0.5%, 20 countries included). Figure 2 displays the distribution of tweets in each region.

Figure 2

Map showing the distribution of diabetes-related tweets according to the region (n=820 615).

Topics of interest

Among all tweets, 269 323 (32.8%) were predicted as posted by men, 311 343 (37.9%) by women, and 239 949 (29.2%) from unknown sex; 254 564 (31%) were from people with type 1 diabetes, 94 948 (11.6%) from type 2 diabetes and 471 203 (57.4%) from people where diabetes type was impossible to predict. Females were over-represented in East Asia and Pacific, Europe and Central Asia, Middle East and North Africa, and North America. Men were over-represented in Latin America and the Caribbean, and South Asia. In all regions, tweets identified as type 1 diabetes-related were predominant.

We identified four topics of interest for the people with diabetes from the Middle East and North Africa, 6 for South Asia, 8 of interest for East Asia and Pacific, 7 for Latin America and the Caribbean, 10 for Europe and Central Asia, 14 for Sub-Saharan Africa and 18 for North America. They are further described below for each region and in online supplemental appendix 3. ‘Glycemic Control’ was a topic found in all regions. Six out of seven showed a common interest such as ‘Family and relatives’ and ‘Food’, whereas ‘Insulin’ matched for five regions. Four regions had common topics related to ‘Comorbidities’. The significance of comparing percentages among emotions in topics in each region was determined using a Student’s t-test. All p values shown are two tailed.

Overall, South Asia had the most positive diabetes-related tweets and was associated with a higher polarity score, while Latin America and the Caribbean had the most negative ones and were associated with a lower score (table 1). On the 820 615 included tweets, 356 683 were identified as positive (n=43.5%), and 308 811 were identified as negative (n=37.6%). South Asia and Europe and Central Asia had a higher proportion of positive tweets (47.6% and 46%, respectively). Latin America and the Caribbean, and North America had a higher proportion of negative tweets (38.2% and 38.5%, respectively). As shown in table 1, the South Asia region was associated with a higher average polarity score, while Latin America and the Caribbean were associated with a lower score. The averaged sentiment scores were slightly positive and between 0.01887 (Latin America and the Caribbean) and 0.10376 (South Asia). Most regions had a positive score (greater than 0.05). In contrast, Latin America and the Caribbean, and North America had a neutral score (between −0.05 and 0.05) as these regions had a higher proportion of tweets with negative sentiment scores.

Table 1

Average sentiment score and distribution sentiment scores

East Asia and Pacific

On average, topics referring to users sharing support and advice such as ‘Type 1 diabetes communities’ (48% compared with 31.4% on average in all other topics, p<0.001) and ‘Glycemic control’ (39% compared with 31.6% on average in all other topics) were associated with higher rates of joy (p<0.001) but also with higher rates of fear (respectively 16.9% and 14.6% compared with 12.1% and 12.3% on average in all other topics, p<0.001) due to frequent fears about the future. ‘Insulin affordability’ was associated with a higher rate of anger (28% compared with 16.6% on average in all other topics, p<0.001) because of users reacting to the huge insulin pricing gap between the USA and East Asia and Pacific.20 ‘Diabetes-related complications and family history’ was associated with a higher probability of sadness (45.8% compared with 38.1% on average in all other topics, p<0.001).

Europe and Central Asia

The two topics dealing with insulin (‘Insulin access’ and ‘Insulin and insulin supplies’) were associated with a higher probability of anger (respectively 28.6% and 26.2% compared with 15.87% and 16.3% on average in all other topics, p<0.001). Topics discussing relatives’ life with diabetes and complications (‘Diabetes-related complications and family history’ and ‘Life changes since diagnosis’) were associated with sadness (respectively 45.6% and 43% compared with 35.6% and 35.9% on average in all other topics, p<0.001). Topics ‘Daily management of diabetes’ and ‘Type 1 diabetes communities’ were mostly associated with joy (respectively 43.7% and 50.4% compared with 32% and 33% on average in all other topics, p<0.001).

Latin America and the Caribbean

Similar to Europe and Central Asia, the topic ‘Insulin issues’ was associated with a higher probability of anger (28.7% compared with 15.6% on average in all other topics, p<0.001). Topics in which users shared love and advice (‘Love and support’ and ‘Glycemic control’) were associated with a higher probability of joy (respectively 46.02% and 37.9% compared with 29% and 29.1% on average in all other topics, p<0.001). Finally, topics dealing with relatives’ health complications and life with diabetes (‘Complications and comorbidities’ and ‘Experiences from relatives living with diabetes’) were associated with a higher probability of sadness (respectively 47.9% and 47.8% compared with 42.7% and 40.4% on average in all other topics, p<0.001).

Middle East and North Africa

Topic ‘Insulin and insulin supplies’ was associated with a higher probability of anger (28.8% compared with 15.6% on average in all other topics, p<0.001). In this topic, users were reacting to the difficulty of insulin and insulin supplies self-management. However, sadness was the main identified emotion in all topics (39% on average).

North America

The five topics dealing with insulin pricing and affordability (‘Inability to afford insulin’, ‘Consequences of insulin unaffordability’, ‘Insulin prices increase’, ‘Insulin pricing including insurance’ and ‘Costs implied by diabetes management’) were associated with a higher probability of anger (between 20.1% and 32.7% compared with 17.9% to 18.8% on average in all other topics, p<0.001). Most topics were associated with a higher probability of sadness (41% on average) except ‘Type 1 diabetes communities’, ‘Glucose tests’ and ‘Sharing daily life’ were associated with a higher probability of joy (respectively 46.1%, 48.7%, and 42.5% compared with 29.8%, 29.6% and 28.9% on average in all other topics, p<0.001).

South Asia

The highest average of anger was associated with the topic ‘Insulin use’ (25.4% compared with 13.9% on average in all other topics, p<0.001). ‘Food habits’ was associated with joy (39.01% compared with 30.2% on average in all other topics, p<0.001), while all other topics were mainly dominated by high rates of sadness (more than 40%).

Sub-Saharan Africa

The topic ‘Insulin’ was associated with anger (22.5% compared with 15.3% on average in all other topics, p<0.001) because of users’ angry reactions to diabetes misunderstanding and struggles to get insulin. The topic dealing with ‘Glucose guardian’ was dominated by joy (39.3% compared with 28.9% on average in all other topics, p<0.001) as users were thanking others for their help or shared excellent glucose levels. In comparison, all other topics were dominated by sadness (between 37% and 46.02%).

Details about the average probabilities of sentiment distribution are available in online supplemental appendix 3.


In this study, we used worldwide social media data to better assess the global diabetes burden, from the perspective of PWD, and to study regional differences, which will serve to design more patient-centered diabetes programs. Social media data provide direct access to individual points of view and experiences of PWD, which can improve our understanding of how diabetes impacts their daily lives.

We have shown that some concerns are universal and shared by different online communities of PWD, while others are region-specific (eg, North America, which has five insulin-related topics). We found that matters related to food, glycemic control, family and relatives, insulin and comorbidities were shared by at least four of the seven regions. Tweets in which users shared their concerns and experiences about their relatives’ diabetes, family health history and comorbidities were associated with higher rates of sadness (47.2% of all related clusters and regions combined compared with 38.7% on average). On the contrary, most joyful tweets referred to users sharing advice, motivation and peer-supporting and encouraging each other (37.7% of all related clusters and all regions combined compared with 31.1% on average). We also observed that 5 out of the 18 topics of interest in North America were related to insulin pricing, unaffordability and the consequences of such pricing on health (on physical and mental health). Overall, these tweets correspond to 18.95% (n=1 01 019) of all tweets originating from the USA (n=5 32 981).21 Additionally, these topics were associated with higher rates of anger (28.04% compared with 19.2% on average in the USA and 19.1% in North America). Meanwhile, users from Europe and Asia and other regions (Europe and Central Asia, East Asia and Pacific) were sympathetic to patients from the USA, sharing their disgust and misunderstanding of the insulin pricing gap between their region and the USA. These results from North America are consistent with the previous work from Ahne et al,14 who showed that insulin pricing is a central concern among PWD on Twitter in the USA.

Presumably, no previous study relied on such an extensive international database of posts from PWD to describe the diabetes burden. Our approach is more inclusive than those relying on questionnaires, such as patient-reported outcome measures or patient-reported experience measures scales with predefined items. We monitored key diabetes-related concerns of PWD and quantified the associated emotions in different communities around the world. We have observed an elevated global burden of diabetes, with regional specificities that need to be taken into account more diligently.22 Diabetes-related distress is present in every diabetes community and is sometimes under-researched, such as in Sub-Saharan Africa, and social media can help overcome these concerns.23 Özcan et al24 studied people with type 2 diabetes from different ethnicities in the Netherlands and showed that ethnicity is independently associated with high diabetes distress. However, Gariepy et al25 showed that diabetes distress in people with type 2 diabetes potentially varies according to some geographical and sociodemographic factors (such as social and physical order or cultural and social environment), which reinforces our hypothesis to compare diabetes burden determinants in different regions of the world. Besides, patients’ state of mind heavily influences their self-management habits. Richman et al showed that positive emotions were associated with overall better health status, whereas Coccaro et al suggested that diabetes distress is associated with negative emotions and the regulation of emotions.26 27 Thus, as recommended by Kalra et al28, tackling patients’ intellectual and emotional needs would be one solution to overcome the psychological barrier to adherence and self-care. Our findings corroborate earlier research, indicating that diabetes burden is a common issue discussed on social media in all different regions of the world and at different levels of severity. These findings also suggest that diabetes self-management is one of the biggest concerns, as PWD from the seven World Bank Regions shared concerns regarding glycemic control and food. Moreover, concerns at the regional level were identified, such as insulin pricing in North America or the fear of complications and comorbidities in Latin America and the Caribbean. This discovery highlights the need to develop new global methodologies to tackle universal concerns regarding self-care and focus on more specific ones at a regional or country level to improve PWD experiences and deal with their outcomes.

This study has several limitations. First, the list of the diabetes-related keywords we used to collect the tweets may have been incomplete. This list has been created by translating an original list of English keywords, and we may have missed specific local diabetes-related keywords and associated issues in some countries. Second, some language-specific subtleties may have gone astray, as translating non-English tweets to English may obscure the original meaning. Third, although this study essays the diabetes burden on a global level, we did not manage to recover data from every country. However, this is the most comprehensive analysis on an international scale to date. Fourth, a bias in the geolocation analysis might exist, as the location is self-reported by users. We manually excluded areas that appeared to be fake. Some tweets have been localized as coming from China where Twitter is blocked. Twitter is still accessed by a lot of Chinese people who are, for instance, using a VPN. This may explain why some users localize themselves in China.29 Furthermore, the geographical coordinates provided by a tweet’s metadata were identified as being, by default, in the center of the country. As a result, the distribution map of the tweets shows geographical markers that are not necessarily in populated areas. Fifth, the precision of the different classifiers we used was not perfect. An additional limitation is that our results are based on subjective statements from people using social media and do not represent all PWD. Finally, due to the prevalence of sarcasm and irony on social media and the fact that we searched to define key emotions in every tweet, we cannot ensure that all emotions were correctly identified, despite our efforts to remove jokes and irony.

In this work, we demonstrated that the global needs and concerns of PWD varied vastly based on region and that the diabetes burden was perceived differently, despite some shared concerns. Our results suggest a necessity to improve the integration of these regional and global factors into future diabetes programs to enhance patient-centric diabetes research and care from the perspective of people with diabetes. This will contribute to improving the personalization of diabetes care and self-management.

Data availability statement

Data are available on reasonable request. According to the Twitter API, tweets cannot be shared but tweets' IDs can be provided on request.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.


Supplementary materials


  • Contributors GF, as the guarantor, accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. The authors’ contributions were as follows: GF designed the research; CB and GF conducted the research; AA and CB collected the data; CB, GF, and AA labeled the data; CB and GF analyzed data; CB and GF interpreted the data; CB and GF drafted the article; GF, AF, DM, PK, GA and AA revised the manuscript critically.

  • Funding This work was supported by the MSDAvenir Foundation (World Diabetes DIstress Study) and the Luxembourg Institute of Health.

  • Map disclaimer The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.