Statistics, Science, Random Ramblings

A blog mostly about data and R

Analysing data on suicide

Posted at — Nov 11, 2019

This analysis is on suicide rates from 1985 to 2016 in a set of about 100 countries with additional information on age, gender and economic status of that country. The data is available from Kaggle.

This post here is a slightly shortened version of the full analysis that can be found in the repository on Gitlab.

The data

Data cleaning and augmentation

suicide <- readRDS("suicide.Rds")

As always some data cleaning and supplementing was undertaken before beginning to analyse data. Data cleaning was however minor and included:

  • Cleaning of column names.
  • Removal of the HDI (human development index) column as it was mostly missing.
  • Removal of data from countries where the amount of data points was less than half of the possible amount of data points for a country.
  • Removal of the year 2016 from the data. It was the latest data available and was only present for a few countries, making the data noisy.
  • Addition of information on the continent a country belongs to.

The final data looks like this:

## # A tibble: 25,752 x 12
##    country  year sex   age   suicides_no population suicides_100k_p…
##    <chr>   <dbl> <chr> <chr>       <dbl>      <dbl>            <dbl>
##  1 Albania  1987 male  15-2…          21     312900             6.71
##  2 Albania  1987 male  35-5…          16     308000             5.19
##  3 Albania  1987 fema… 15-2…          14     289700             4.83
##  4 Albania  1987 male  75+ …           1      21800             4.59
##  5 Albania  1987 male  25-3…           9     274300             3.28
##  6 Albania  1987 fema… 75+ …           1      35600             2.81
##  7 Albania  1987 fema… 35-5…           6     278800             2.15
##  8 Albania  1987 fema… 25-3…           4     257200             1.56
##  9 Albania  1987 male  55-7…           1     137500             0.73
## 10 Albania  1987 fema… 5-14…           0     311000             0   
## # … with 25,742 more rows, and 5 more variables: country_year <chr>,
## #   gdp_for_year <dbl>, gdp_per_capita <dbl>, generation <chr>,
## #   continent <chr>

Map of countries with available data

First, let’s see for which countries data is available.

data_countries <- suicide %>% 
  select(country) %>% 
  distinct() %>% 
  mutate(country = countrycode::countrycode(
    country, origin = "", destination = "")) %>% 

world_map <- rnaturalearth::ne_countries(returnclass = "sf") %>% 
  select(admin, geometry) %>% 
  mutate(admin_iso = countrycode::countrycode(
    admin, origin = "", destination = "")) %>% 
  # kosovo turns NA (likely as it is not universally recognised as independent)
  mutate(admin_iso = tidyr::replace_na(admin_iso, "Kosovo")) %>% 
  mutate(in_data = ifelse(admin_iso %in% data_countries, 
                          "available", "not available"))

ggplot(world_map) + 
  aes(fill = in_data) + 
  geom_sf(colour = "black", size = .1) + 
  scale_fill_manual(values = c("available" = "maroon4", 
                               "not available" = "white")) +
  coord_sf(datum = NA) + 
  theme_minimal() +
  theme(legend.position = "bottom") +
  labs(title = "Countries with available data", fill = "")

The map clearly shows that for many countries of Africa and Asia no data is available. Consequently, the analysis below is not able to identify truly global patterns and caution is advised in generalising the findings too much.


The available data is somewhat limited and suicide is a rather complex issue. In-depth conclusions on suicide can not be drawn from the data that is available, but some patterns could be identified:

  • In all countries for which data is available, men commit suicide more often than women. The extent of the disparity between genders varies, but there is not a single country with an opposite pattern. This is in line with the public health situation regarding mental health for men being less than optimal as expressing emotion and mental health issues is particularly stigmatised for men (while at the same time physical health problems in women are often not taken seriously).
  • Older people commit suicide more often than younger people, with suicide rates in children being close to zero. The fact that older people are much more likely to commit suicide is to some extent likely the result of severe disease being more common in high age as well as specific mental health issues associated with high age.
  • Trends in suicide rates vary heavily between countries. This is not too surprising as problem in society and public health are often country specific. On the positive side of things, there are more countries with strongly declining suicide rates than those with strongly increasing rates. Of the twelve countries with most decreasing suicide rates in the data ten are located in Europe, but only two European countries are among the twelve with the most increasing suicide rates.