3 min read

Computing population centers

In demographics, the centre of population (or population center, or population centroid) of a region is a geographical point that describes a centrepoint of the region’s population. See here for more detail. The figure below shows for example the change of the location of the population center of the USA from 1790 to 2010:

US Mean Center of Population 1790-2010

In this post we show how we can use population density raster data from the WorldPop project to calculate population centers of given polygons such as those provided by the GADM project. We will use Vietnam as a case study, first calculating the population center of the province of Hanoi and comparing it to the geometric barycenter. Then, we’ll introduce a pipeline that allows to calculate the population centers of all the provinces of Vietnam in one go.

The WorldPop data can be accessed directly from the worldpopVN package. The GADM data will be downloaded using the gadm() function of the sptools package.

# install.packages(c("devtools", "sp", "raster", "rgeos", "magrittr", "dplyr", "purrr"))
# devtools::install_github("choisy/worldpopVN")
# devtools::install_github("choisy/sptools")
library(sp)       # sp methods
library(raster)   # raster methods
library(magrittr) # %>% 

The WorldPop data:

wp <- worldpopVN::getpop()

Which gives:

plot(wp)

The provinces polygons from GADM:

gadm <- sptools::gadm("vietnam", "sp", 1)
## 
##  Do you want to download it in your internal library ?   (yes/ no) ?
## 
## Selection: 
## 
##  Do you want to save the map in another location  (yes/ no (default))
## 
## Selection:

Which gives:

plot(gadm)

Let’s crop the raster data along the polygon of the province of Hanoi:

hanoi_polygon <- subset(gadm, VARNAME_1 == "Ha Noi")
hanoi <- sptools::crop_on_poly(wp, hanoi_polygon)

Which gives:

plot(hanoi)

From that, calculating the population center needs 2 steps:

  • extracting the population values in each pixel and transforming them into weights (i.e. between 0 and 1 and summing to 1);
  • extracting the coordinates of the center of each pixel and weight-summing the longitude and latitude values with the weights calculated in the first step.

Before these 2 steps, the raster need to be converted to a SpatialPointsDataFrame object:

hanoi2 <- raster::rasterToPoints(hanoi, spatial = TRUE)

The first step is:

values <- hanoi2@data[[1]]
weights <- values / sum(values)

The second step is:

pop_center <- hanoi2 %>%
  sp::coordinates() %>%
  as.data.frame() %>%
  lapply(. %>%
           `*`(weights) %>%
            sum()) %>%
  unlist()

For comparison, let’s compute the simple geometric barycenter from the polygon of Hanoi:

barycenter <- hanoi_polygon %>% 
  rgeos::gCentroid() %>% 
  sp::coordinates()

Let’s now compare the locations of the population center and the geometric barycenter:

plot(hanoi)
points(pop_center[1], pop_center[2], pch = 1) # circle
points(barycenter[1], barycenter[2], pch = 2) # triangle

A pipeline to calculate the population centers of all the provinces

Let’s now imagine that we want to calculate the coordinates of the population centers of all the provinces of the gadm object. First, we’d need to include the 2 steps of the centroid calculation into a function:

weighted_centroid <- function(spptdf) {
  require(sp)
  require(magrittr)
  weights <- spptdf %>% # step 1
    slot("data") %>% 
    {. / sum(.)}
  spptdf %>%            # step 2
    coordinates() %>%
    as.data.frame() %>%
    lapply(. %>%
             `*`(weights) %>%
             sum()) %>%
    unlist()
}

Then, we can run the following pipeline. It takes about 10’ on a MacBook Pro, the vast majority of that time being for the crop_on_poly() call.

pop_centers <- gadm %>%
  sptools::as_list() %>% 
  parallel::mclapply(sptools::crop_on_poly, rstr = wp) %>% 
  lapply(raster::rasterToPoints, spatial = TRUE) %>% 
  lapply(weighted_centroid) %>%
  purrr::reduce(dplyr::bind_rows)