10 min read

Computing elevation by province

Introduction

Here we show how to compute 2 different measures of elevation by province. The first one is the simple mean of the elevation measures inside the polygon of the province. The second one weights the mean by the local population density. For that, we use the polygons of the country and the provinces from GADM, the SRTM raster elevation data from CGIAR and raster population density data from WorldPop.

Packages

library(sp)
library(raster)

Simple mean

Let’s downlaod the elevation data (751.5 MB) from SRTM:

download.file("http://srtm.csi.cgiar.org/wp-content/uploads/files/srtm_30x30/TIFF/N00E090.zip", "srtm.zip")

Let’s unzip it (making a file of 2.4 GB):

unzip("srtm.zip")

And load it into R:

srtm <- raster("cut_n00e090.tif")

which gives

srtm
## class      : RasterLayer 
## dimensions : 36023, 36023, 1297656529  (nrow, ncol, ncell)
## resolution : 0.0008333334, 0.0008333334  (x, y)
## extent     : 89.99042, 120.0096, -0.009583358, 30.00958  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : /Users/choisy/Dropbox/aaa/www/blog/content/post/cut_n00e090.tif 
## names      : cut_n00e090 
## values     : -32768, 32767  (min, max)

Note that the CRS is not defined… However, from here we can see that “The data is projected in a Geographic (Lat/Long) projection, with the WGS84 horizontal datum and the EGM96 vertical datum.” Let’s thus define this projection:

crs(srtm) <- CRS("+init=EPSG:4326")

Let’s check:

srtm
## class      : RasterLayer 
## dimensions : 36023, 36023, 1297656529  (nrow, ncol, ncell)
## resolution : 0.0008333334, 0.0008333334  (x, y)
## extent     : 89.99042, 120.0096, -0.009583358, 30.00958  (xmin, xmax, ymin, ymax)
## crs        : +init=EPSG:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : /Users/choisy/Dropbox/aaa/www/blog/content/post/cut_n00e090.tif 
## names      : cut_n00e090 
## values     : -32768, 32767  (min, max)

We’re good! This file is quite big and we need the data only for Vietnam. We thus want to crop it. For that we need the polygon of Vietnam that we can download from GADM:

download.file("https://biogeo.ucdavis.edu/data/gadm3.6/Rsp/gadm36_VNM_0_sp.rds", "country.rds")

Lets load it:

country <- readRDS("country.rds")

Which gives:

country
## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : 102.1446, 109.4692, 8.381355, 23.39269  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 2
## names       : GID_0,  NAME_0 
## value       :   VNM, Vietnam

We can see that this time the CRS is correctly defined and that it is the same projection as for the SRTM raster file. Let’s plot the elevation data together with the country polygon:

plot(srtm)
plot(country, add = TRUE)

Let’s now use the country polygon to crop the raster file. Note that this can be done only because the 2 objects have the same projection. If it were not the case, we would need to reproject one of them according to the projection of the other, using the sp::spTranform() function. It takes about 15’’.

srtm_cropped <- crop(srtm, country)

Let’s write the output to disk:

writeRaster(srtm_cropped, "srtm_cropped.tif")

Let’s reload it:

srtm_cropped <- raster("srtm_cropped.tif")

Let’s see the result:

plot(srtm_cropped)
plot(country, add = TRUE)

We can see that the cropped_version of the GeoTiff object is reduced from 2.4 GB down to 194.6 MB:

file.size(filename(srtm)) / 2^30
## [1] 2.417611

and:

file.size(filename(srtm_cropped)) / 2^20
## [1] 194.6239

Noticing that in this rectangular, we acutally need only a small portion of the pixels, we can go a step further in reducing the data by applying a mask. For that, we first need to create a mask by basically rasterizing the polygon using the same grid as the raster object. Note however that this step takes about 26’, essentially because the size of the polygon is big. Given that, we’ll see at the end of this section that this step is actually not necessary.

country_mask <- rasterize(country, srtm_cropped)

Now that the mask is done, we can use it on the cropped raster:

srtm_masked <- mask(srtm_cropped, country_mask)

Let’s save to disk and reload:

writeRaster(srtm_masked, "srtm_masked.tif")
srtm_masked <- raster("srtm_masked.tif")

We can see that masking allows to reduce the file size from 195 MB to 86 MB:

file.size(filename(srtm_masked)) / 2^20
## [1] 86.16997

Let’s see what it looks like:

plot(srtm_masked)
plot(country, add = TRUE)

Great! Next step now, in order to compute a value of elevation per province, is to get the polygons of the provinces:

download.file("https://biogeo.ucdavis.edu/data/gadm3.6/Rsp/gadm36_VNM_1_sp.rds", "provinces.rds")

Let’s load it:

provinces <- readRDS("provinces.rds")

Which gives:

provinces
## class       : SpatialPolygonsDataFrame 
## features    : 63 
## extent      : 102.1446, 109.4692, 8.381355, 23.39269  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 10
## names       : GID_0,  NAME_0,   GID_1,   NAME_1, VARNAME_1, NL_NAME_1,     TYPE_1, ENGTYPE_1, CC_1, HASC_1 
## min values  :   VNM, Vietnam, VNM.1_1, An Giang,  An Giang,        NA, Thành phố ,      City,   NA,  VN.AG 
## max values  :   VNM, Vietnam, VNM.9_1,  Yên Bái,   Yen Bai,        NA,       Tỉnh,  Province,   NA,  VN.YB

We can again verify that the projection is the same as the raster and the country polygon. Let’s see what it looks like:

plot(provinces)

What we want to do now is, for each province, extracting the elevation data from the raster file that are inside the polygon of the province and average them. This is what the following function does:

mean_ele <- function(x) {
  require(magrittr)
  prov <- provinces[x, ]
  rast <- crop(srtm_masked, prov) # cropping before the mask
  mask(rast, rasterize(prov, rast)) %>% 
    values() %>% 
    mean(na.rm = TRUE)
}

Note interestingly that a crop before the mask speeds up the masking by 6. Interestingly too, this mean_ele() function is about twice as fast as the use of the extract() function. Compare for example

system.time(opt1 <- sapply(1:3, mean_ele))
##    user  system elapsed 
##  37.431   0.508  39.234

with

system.time(opt2 <- sapply(extract(srtm_masked, provinces[1:3, ]), mean, na.rm = TRUE))
##    user  system elapsed 
##  41.841   7.282  49.593

The reason for that is not clear to me. Maybe there are a number of checks in extract() function that substantially slow everything down. Let’s check that it provides the same results however:

opt1 - opt2
## [1] 0 0 0

Let’s also see what is the benefit of working on strm_masked instead of strm_cropped:

mean_ele2 <- function(x) {
  require(magrittr)
  prov <- provinces[x, ]
  rast <- crop(srtm_cropped, prov)
  mask(rast, rasterize(prov, rast)) %>% 
    values() %>% 
    mean(na.rm = TRUE)
}
system.time(opt3 <- lapply(1:3, mean_ele2))
##    user  system elapsed 
##  26.341   0.263  27.171

So, it means that the only advantage of the srtm_masked over srtm_cropped is that it’s about twice as light on the disk but it doesn’t make the calculations based on these rasters any faster. Good to know! Let’s now compute this for all the provinces, which takes about 10’:

elevations1 <- sapply(seq_along(provinces), mean_ele)

Weighting by population densities

Now, let’s consider an alternative way of computing these elevations by province, weighting them by the local population density. We can retrieve the local population density for Vietnam as a raster file from WorldPop:

download.file("ftp://ftp.worldpop.org.uk/GIS/Population/Global_2000_2020/2010/VNM/vnm_ppp_2010.tif", "worldpop.tif")

Let’s load it:

worldpop <- raster("worldpop.tif")

Good news here is that the raster is already masked:

plot(worldpop)

It also has the same projection than the other objects we’ve been dealing with so far:

worldpop
## class      : RasterLayer 
## dimensions : 17796, 8789, 156409044  (nrow, ncol, ncell)
## resolution : 0.0008333333, 0.0008333333  (x, y)
## extent     : 102.1454, 109.4696, 8.562917, 23.39292  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : /Users/choisy/Dropbox/aaa/www/blog/content/post/worldpop.tif 
## names      : worldpop 
## values     : 0.02010591, 568.1513  (min, max)

Of note too, it has the same resolution as srtm_masked, but not quite the same grid definition though:

srtm_masked
## class      : RasterLayer 
## dimensions : 18014, 8790, 158343060  (nrow, ncol, ncell)
## resolution : 0.0008333334, 0.0008333334  (x, y)
## extent     : 102.1446, 109.4696, 8.38125, 23.39292  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : /Users/choisy/Dropbox/aaa/www/blog/content/post/srtm_masked.tif 
## names      : srtm_masked 
## values     : -129, 3106  (min, max)

In order to be able to weigth the elevation data from srtm_masked by the population density data from worldpop we need to ensure that the two rasters use the same grid. This implies resampling one of the raster on the grid of the other one. Let’s resample srtm_masked on worldpop. It takes about 2’:

srtm_masked2 <- resample(srtm_masked, worldpop)

We can check that it worked:

worldpop
## class      : RasterLayer 
## dimensions : 17796, 8789, 156409044  (nrow, ncol, ncell)
## resolution : 0.0008333333, 0.0008333333  (x, y)
## extent     : 102.1454, 109.4696, 8.562917, 23.39292  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : /Users/choisy/Dropbox/aaa/www/blog/content/post/worldpop.tif 
## names      : worldpop 
## values     : 0.02010591, 568.1513  (min, max)

and:

srtm_masked2
## class      : RasterLayer 
## dimensions : 17796, 8789, 156409044  (nrow, ncol, ncell)
## resolution : 0.0008333333, 0.0008333333  (x, y)
## extent     : 102.1454, 109.4696, 8.562917, 23.39292  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : /private/var/folders/0z/99v2vkd57yd4h_wwxbzqm2lm0000gn/T/RtmpKLImwn/raster/r_tmp_2019-10-24_182415_9898_99280.grd 
## names      : srtm_masked 
## values     : -128.7624, 3105.94  (min, max)

Below is a new version of the mean_ele() function that performs the population weighting:

mean_ele3 <- function(x) {
  prov <- provinces[x, ]
  srtm <- crop(srtm_masked2, prov)
  wpop <- crop(worldpop, prov)
  prov_mask <- rasterize(prov, srtm)
  srtm_val <- values(mask(srtm, prov_mask))
  wpop_val <- values(mask(wpop, prov_mask))
  weights <- wpop_val / sum(wpop_val, na.rm = TRUE)
  sum(weights * srtm_val, na.rm = TRUE)
}

Let’s apply it to all the provinces, it takes about 16’:

elevations2 <- sapply(seq_along(provinces), mean_ele3)

Let’s compare the 2 measures of elevation by province:

(elevations <- data.frame(province = provinces$VARNAME_1, elevations1, elevations2))
##             province elevations1 elevations2
## 1           An Giang    8.898235    6.408292
## 2           Bac Lieu    1.695055    1.881781
## 3          Bac Giang  106.352667   29.775390
## 4            Bac Kan  490.829030  394.819273
## 5           Bac Ninh    6.222558    7.070005
## 6            Ben Tre    2.076570    2.810856
## 7  Ba Ria - Vung Tau   58.798530   30.709794
## 8          Binh Dinh  264.440442   51.091846
## 9         Binh Duong   38.621079   25.638468
## 10        Binh Phuoc  176.327987  145.018136
## 11        Binh Thuan  217.787158   72.584621
## 12           Can Tho    2.663051    3.144337
## 13            Ca Mau    2.131124    2.131788
## 14          Cao Bang  617.552520  514.365906
## 15           Dak Lak  502.929377  494.978901
## 16          Dak Nong  635.457842  597.524631
## 17          Dong Nai  102.931787   71.755471
## 18         Dong Thap    3.443112    4.442910
## 19           Da Nang  276.254180   26.729688
## 20         Dien Bien  898.531873  734.174963
## 21           Gia Lai  526.389724  505.270508
## 22         Hai Duong   10.678064    5.502737
## 23         Hai Phong   13.913417    5.799682
## 24         Hau Giang    1.672961    1.840450
## 25       Ho Chi Minh    5.020083    6.362934
## 26          Ha Giang  718.742222  639.124498
## 27            Ha Noi   25.105430   12.541751
## 28            Ha Nam   21.396139   10.562716
## 29           Ha Tinh  173.831383   35.958710
## 30          Hoa Binh  315.397034  174.539607
## 31          Hung Yen    5.293893    5.748083
## 32         Khanh Hoa  381.844209   68.901831
## 33        Kien Giang   10.164205    6.650253
## 34           Kon Tum  887.268424  693.869140
## 35          Lang Son  349.812444  285.813686
## 36          Lai Chau 1000.487861  804.993577
## 37          Lam Dong  911.213147  944.541232
## 38           Lao Cai  828.443694  506.560942
## 39           Long An    2.094010    2.222340
## 40          Nam Dinh    2.727538    3.115513
## 41           Nghe An  393.868204   81.151246
## 42         Ninh Binh   49.014519   13.659144
## 43        Ninh Thuan  354.701355   70.047739
## 44           Phu Tho  146.158709   55.094982
## 45           Phu Yen  258.861542   68.853622
## 46        Quang Binh  287.294469   75.554779
## 47         Quang Nam  464.724461   84.292007
## 48        Quang Ngai  283.743576   56.648166
## 49        Quang Ninh  168.250187   52.426892
## 50         Quang Tri  256.356000   82.139390
## 51         Soc Trang    1.734239    2.184876
## 52            Son La  828.559643  680.606294
## 53          Tay Ninh   23.078405   16.178220
## 54    Thua Thien Hue  301.140207   51.183379
## 55         Thai Binh    2.502572    2.757502
## 56       Thai Nguyen  162.562823   68.150190
## 57         Thanh Hoa  255.976636   51.144353
## 58        Tien Giang    2.361345    2.928570
## 59          Tra Vinh    2.026229    2.653617
## 60       Tuyen Quang  284.681424  133.301032
## 61         Vinh Long    1.778635    2.182872
## 62         Vinh Phuc   96.040181   28.194070
## 63           Yen Bai  666.156834  314.113652

Let’s visualize this:

plot(elevations2 ~ elevations1, elevations, xlab = "weighted by population density", ylab = "non weighted", col = "blue")
abline(0, 1)

This figure shows clearly that the population tends to gather in the lower parts of the province.