How to get arabic names for Mauritania regions

R
HDX
GIS
Author
Published

October 1, 2018

On HDX, you can download and use the administrative boundaries of Mauritania but with one caveat the names of the different administrative divisions are translated from Arabic to English. For some analysis, it can be useful to have also the Arabic name in the same table. In this post, we are going to scrape a table from with the Arabic name from a website before joining this table to our administrative boundaries data. We will need the rhdx package (not yet on CRAN) and the following packages:

library(tidyverse)
library(sf)
library(purrr)
library(rvest)
library(stringdist)
library(httr)
library(rhdx) ## remotes::install_gitlab("dickoa/rhdx")

We can use rhdx::pull_dataset to read the Mauritania administrative boundaries dataset in R and use rhdx::get_resources to list available resources (aka files).

pull_dataset("cod-ab-mrt") %>%
  get_resources() %>%
  as_tibble() %>%
  slice_head(n = 3)
# A tibble: 3 × 5
  resource_id                  resou…¹ resou…² resou…³ resource  
  <chr>                        <chr>   <chr>   <chr>   <list>    
1 cab34844-8dd1-4a1b-af55-db5… MRT_Ad… xlsx    https:… <HDXResrc>
2 dacb6ad2-13b6-4f14-b1e9-44b… mrt_ad… shp     https:… <HDXResrc>
3 1e7d4873-1151-4d83-a38d-11f… mrt_ad… emf     https:… <HDXResrc>
# … with abbreviated variable names ¹​resource_name,
#   ²​resource_format, ³​resource_url

We can see from the output that the 2nd resource contains the shapefile with regions layer.

mrt_adm1 <- pull_dataset("cod-ab-mrt") %>%
  get_resource(2) %>%
  read_resource(layer = "mrt_admbnda_adm1_gov_20200801")
glimpse(mrt_adm1)
Rows: 13
Columns: 13
$ Shape_Leng <dbl> 22.8673631, 12.5340794, 8.6261225, 13.117716…
$ Shape_Area <dbl> 19.40005835, 3.04749757, 2.82872414, 3.24448…
$ ADM1_EN    <chr> "Adrar", "Assaba", "Brakna", "Dakhlet-Nouadh…
$ ADM1_PCODE <chr> "MR01", "MR02", "MR03", "MR04", "MR05", "MR0…
$ ADM1_REF   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ ADM1ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ ADM1ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ ADM0_EN    <chr> "Mauritania", "Mauritania", "Mauritania", "M…
$ ADM0_PCODE <chr> "MR", "MR", "MR", "MR", "MR", "MR", "MR", "M…
$ date       <date> 2020-06-12, 2020-06-12, 2020-06-12, 2020-06…
$ validOn    <date> 2020-07-31, 2020-07-31, 2020-07-31, 2020-07…
$ validTo    <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ geometry   <POLYGON [°]> POLYGON ((-6.3422 22.8704, ..., POLY…

We can see that the Arabic names are not available in this data, we can even visualize the available name using ggplot2 and sf.

mrt_adm1 %>%
  ggplot() +
  geom_sf() +
  geom_sf_label(aes(label = ADM1_EN)) +
  theme_minimal()

A map of Mauritania regions with their labels in English

We need the Arabic name and this Wikipedia has a table with names in Arabic and English. We can use the rvest R package to scrape the data, and map it to our geospatial layer.

url <- "https://en.wikipedia.org/wiki/Regions_of_Mauritania"

arabic_adm1 <- url |>
  read_html() |>
  html_nodes("table.wikitable") |>
  html_table() |>
  first() |>
  select(ADM1_EN = Name, ADM1_AR = `Native name`)
glimpse(arabic_adm1)
Rows: 15
Columns: 2
$ ADM1_EN <chr> "Adrar", "Assaba", "Brakna", "Dakhlet Nouadhibo…
$ ADM1_AR <chr> "أدرار", "لعصابة", "لبراكنة", "داخلة نواذيبو", …

As you can see, this table contains some Arabic names (ADM1_AR), we now need to join it to our boundaries data. However, because of spelling differences between the two ADM1_EN columns in each table, we need to apply some approximative matching (stringdist::amatch).

ind <- amatch(arabic_adm1$ADM1_EN, mrt_adm1$ADM1_EN, maxDist = 4)
arabic_adm1$ADM1_EN <- mrt_adm1$ADM1_EN[ind]

We are missing Nouackchot since it was divided in 3 sections (North, South and West) but since we have most of the available regions, we can join the two data and check the final results in a map.

final <- left_join(mrt_adm1,
                   select(arabic_adm1, ADM1_EN, ADM1_AR))

ggplot(final) +
  geom_sf() +
  geom_sf_label(aes(label = ADM1_AR)) +
  theme_minimal()

A map of Mauritania regions with their labels in Arabic

Session info for this analysis.

Session info
devtools::session_info()
─ Session info ────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 Patched (2022-11-12 r83340)
 os       Arch Linux
 system   x86_64, linux-gnu
 ui       X11
 language en_US.UTF-8
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       UTC
 date     2022-12-28
 pandoc   2.19.2 @ /usr/bin/ (via rmarkdown)

─ Packages ────────────────────────────────────────────────────
 package       * version    date (UTC) lib source
 abind           1.4-5      2016-07-21 [1] CRAN (R 4.2.2)
 assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.2.2)
 backports       1.4.1      2021-12-13 [1] CRAN (R 4.2.2)
 base64enc       0.1-3      2015-07-28 [1] CRAN (R 4.2.2)
 broom           1.0.2      2022-12-15 [1] CRAN (R 4.2.2)
 cachem          1.0.6      2021-08-19 [1] CRAN (R 4.2.2)
 callr           3.7.3      2022-11-02 [1] CRAN (R 4.2.2)
 cellranger      1.1.0      2016-07-27 [1] CRAN (R 4.2.2)
 class           7.3-20     2022-01-16 [1] CRAN (R 4.2.2)
 classInt        0.4-8      2022-09-29 [1] CRAN (R 4.2.2)
 cli             3.5.0      2022-12-20 [1] CRAN (R 4.2.2)
 colorspace      2.0-3      2022-02-21 [1] CRAN (R 4.2.2)
 crayon          1.5.2      2022-09-29 [1] CRAN (R 4.2.2)
 crul            1.3        2022-09-03 [1] CRAN (R 4.2.2)
 curl            4.3.3      2022-10-06 [1] CRAN (R 4.2.2)
 DBI             1.1.3      2022-06-18 [1] CRAN (R 4.2.2)
 dbplyr          2.2.1      2022-06-27 [1] CRAN (R 4.2.2)
 devtools        2.4.5      2022-10-11 [1] CRAN (R 4.2.2)
 digest          0.6.31     2022-12-11 [1] CRAN (R 4.2.2)
 dplyr         * 1.0.10     2022-09-01 [1] CRAN (R 4.2.2)
 e1071           1.7-12     2022-10-24 [1] CRAN (R 4.2.2)
 ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.2.2)
 evaluate        0.19       2022-12-13 [1] CRAN (R 4.2.2)
 fansi           1.0.3      2022-03-24 [1] CRAN (R 4.2.2)
 farver          2.1.1      2022-07-06 [1] CRAN (R 4.2.2)
 fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.2.2)
 forcats       * 0.5.2      2022-08-19 [1] CRAN (R 4.2.2)
 fs              1.5.2      2021-12-08 [1] CRAN (R 4.2.2)
 gargle          1.2.1      2022-09-08 [1] CRAN (R 4.2.2)
 generics        0.1.3      2022-07-05 [1] CRAN (R 4.2.2)
 ggplot2       * 3.4.0      2022-11-04 [1] CRAN (R 4.2.2)
 glue            1.6.2      2022-02-24 [1] CRAN (R 4.2.2)
 googledrive     2.0.0      2021-07-08 [1] CRAN (R 4.2.2)
 googlesheets4   1.0.1      2022-08-13 [1] CRAN (R 4.2.2)
 gtable          0.3.1      2022-09-01 [1] CRAN (R 4.2.2)
 haven           2.5.1      2022-08-22 [1] CRAN (R 4.2.2)
 hms             1.1.2      2022-08-19 [1] CRAN (R 4.2.2)
 hoardr          0.5.2      2018-12-02 [1] CRAN (R 4.2.2)
 htmltools       0.5.4      2022-12-07 [1] CRAN (R 4.2.2)
 htmlwidgets     1.6.0      2022-12-15 [1] CRAN (R 4.2.2)
 httpcode        0.3.0      2020-04-10 [1] CRAN (R 4.2.2)
 httpuv          1.6.7      2022-12-14 [1] CRAN (R 4.2.2)
 httr          * 1.4.4      2022-08-17 [1] CRAN (R 4.2.2)
 jsonlite        1.8.4      2022-12-06 [1] CRAN (R 4.2.2)
 KernSmooth      2.23-20    2021-05-03 [1] CRAN (R 4.2.2)
 knitr           1.41       2022-11-18 [1] CRAN (R 4.2.2)
 later           1.3.0      2021-08-18 [1] CRAN (R 4.2.2)
 lifecycle       1.0.3      2022-10-07 [1] CRAN (R 4.2.2)
 lubridate       1.9.0      2022-11-06 [1] CRAN (R 4.2.2)
 lwgeom          0.2-10     2022-11-19 [1] CRAN (R 4.2.2)
 magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.2.2)
 memoise         2.0.1      2021-11-26 [1] CRAN (R 4.2.2)
 mime            0.12       2021-09-28 [1] CRAN (R 4.2.2)
 miniUI          0.1.1.1    2018-05-18 [1] CRAN (R 4.2.2)
 modelr          0.1.10     2022-11-11 [1] CRAN (R 4.2.2)
 munsell         0.5.0      2018-06-12 [1] CRAN (R 4.2.2)
 pillar          1.8.1      2022-08-19 [1] CRAN (R 4.2.2)
 pkgbuild        1.4.0      2022-11-27 [1] CRAN (R 4.2.2)
 pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.2.2)
 pkgload         1.3.2      2022-11-16 [1] CRAN (R 4.2.2)
 prettyunits     1.1.1      2020-01-24 [1] CRAN (R 4.2.2)
 processx        3.8.0      2022-10-26 [1] CRAN (R 4.2.2)
 profvis         0.3.7      2020-11-02 [1] CRAN (R 4.2.2)
 promises        1.2.0.1    2021-02-11 [1] CRAN (R 4.2.2)
 proxy           0.4-27     2022-06-09 [1] CRAN (R 4.2.2)
 ps              1.7.2      2022-10-26 [1] CRAN (R 4.2.2)
 purrr         * 1.0.0      2022-12-20 [1] CRAN (R 4.2.2)
 R6              2.5.1      2021-08-19 [1] CRAN (R 4.2.2)
 rappdirs        0.3.3      2021-01-31 [1] CRAN (R 4.2.2)
 Rcpp            1.0.9      2022-07-08 [1] CRAN (R 4.2.2)
 readr         * 2.1.3      2022-10-01 [1] CRAN (R 4.2.2)
 readxl          1.4.1      2022-08-17 [1] CRAN (R 4.2.2)
 remotes         2.4.2      2021-11-30 [1] CRAN (R 4.2.2)
 reprex          2.0.2      2022-08-17 [1] CRAN (R 4.2.2)
 rhdx          * 0.1.0.9000 2022-11-03 [1] gitlab (dickoa/rhdx@c443336)
 rlang           1.0.6      2022-09-24 [1] CRAN (R 4.2.2)
 rmarkdown       2.19       2022-12-15 [1] CRAN (R 4.2.2)
 rvest         * 1.0.3      2022-08-19 [1] CRAN (R 4.2.2)
 scales          1.2.1      2022-08-20 [1] CRAN (R 4.2.2)
 sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.2.2)
 sf            * 1.0-9      2022-11-08 [1] CRAN (R 4.2.2)
 shiny           1.7.4      2022-12-15 [1] CRAN (R 4.2.2)
 stars           0.6-0      2022-11-21 [1] CRAN (R 4.2.2)
 stringdist    * 0.9.10     2022-11-07 [1] CRAN (R 4.2.2)
 stringi         1.7.8      2022-07-11 [1] CRAN (R 4.2.2)
 stringr       * 1.5.0      2022-12-02 [1] CRAN (R 4.2.2)
 tibble        * 3.1.8      2022-07-22 [1] CRAN (R 4.2.2)
 tidyr         * 1.2.1      2022-09-08 [1] CRAN (R 4.2.2)
 tidyselect      1.2.0      2022-10-10 [1] CRAN (R 4.2.2)
 tidyverse     * 1.3.2      2022-07-18 [1] CRAN (R 4.2.2)
 timechange      0.1.1      2022-11-04 [1] CRAN (R 4.2.2)
 triebeard       0.3.0      2016-08-04 [1] CRAN (R 4.2.2)
 tzdb            0.3.0      2022-03-28 [1] CRAN (R 4.2.2)
 units           0.8-1      2022-12-10 [1] CRAN (R 4.2.2)
 urlchecker      1.0.1      2021-11-30 [1] CRAN (R 4.2.2)
 urltools        1.7.3      2019-04-14 [1] CRAN (R 4.2.2)
 usethis         2.1.6      2022-05-25 [1] CRAN (R 4.2.2)
 utf8            1.2.2      2021-07-24 [1] CRAN (R 4.2.2)
 vctrs           0.5.1      2022-11-16 [1] CRAN (R 4.2.2)
 withr           2.5.0      2022-03-03 [1] CRAN (R 4.2.2)
 xfun            0.36       2022-12-21 [1] CRAN (R 4.2.2)
 xml2            1.3.3      2021-11-30 [1] CRAN (R 4.2.2)
 xtable          1.8-4      2019-04-21 [1] CRAN (R 4.2.2)
 yaml            2.3.6      2022-10-18 [1] CRAN (R 4.2.2)

 [1] /usr/lib/R/library

───────────────────────────────────────────────────────────────