In-Class Exercise 10

In this exercise, we will recap on the use of geocoding and learn how to work with Open Government Data.

Author

Teng Kok Wai (Walter)

Published

November 4, 2024

Modified

November 8, 2024

1 Exercise Reference

2 Learning Outcome

In this exercise, we will recap on the use of geocoding and learn how to work with Open Government Data.

3 Import the R Packages

pacman::p_load(tidyverse, sf, tmap, httr, performance)

4 Geocoding

The HDB resale data can be downloaded from here. The dataset contains resale flat prices based on registration date from Jan 2017 to Sep 2024.

The code below reads the raw CSV file containing the resale flat data and filters it to include only records from January 2023 to September 2024.

resale <- read_csv("data/raw_data/resale.csv") %>%
  filter(month >= "2023-01" & month <= "2024-09")

The following code tidies the data by creating new columns: - address: Combines block and street_name to form a complete address. - remaining_lease_yr: Extracts the remaining lease years as an integer. - remaining_lease_mth: Extracts the remaining lease months as an integer.

resale_tidy <- resale %>%
  mutate(address = paste(block,street_name)) %>%
  mutate(remaining_lease_yr = as.integer(
    str_sub(remaining_lease, 0, 2)))%>%
  mutate(remaining_lease_mth = as.integer(
    str_sub(remaining_lease, 9, 11)))

Next, we filter the tidy dataset to include only records from September 2024 and take the first 10 addresses as an example.

resale_selected <- resale_tidy %>%
  filter(month == "2024-09")

first_10_resale <- head(resale_selected, 10)

Then, we generate a sorted list of unique addresses from the filtered dataset. This will be used to retrieve geographical coordinates.

add_list <- sort(unique(first_10_resale$address))
add_list
[1] "174 ANG MO KIO AVE 4"  "223 ANG MO KIO AVE 1"  "225 ANG MO KIO AVE 1"
[4] "308A ANG MO KIO AVE 1" "308B ANG MO KIO AVE 1" "314 ANG MO KIO AVE 3"
[7] "320 ANG MO KIO AVE 1"  "321 ANG MO KIO AVE 1"  "331 ANG MO KIO AVE 1" 

In the code below, we will perform geocoding and save the output into 2 data frames, found and not_found.

found data frames contains data from successful API calls and not_found contains data with api errors etc. These data require additional care and manual geocoding may be required.

url <- "https://onemap.gov.sg/api/common/elastic/search"
found <- data.frame()
not_found <- data.frame()

for (address in add_list){
  query <- list('searchVal'=address, 'returnGeom'='Y', 
                'getAddrDetails'='Y', 'pageNum'='1')
  res <- GET(url, query=query)
  
  if ((content(res)$found)!=0){
    tmp_df <- data.frame(content(res))[4:13]
    tmp_df$address<- address
    found <- rbind(found, tmp_df)

  } else {
    not_found <- rbind(not_found, data.frame(address = address))
  }
  
}
found
                         results.SEARCHVAL results.BLK_NO   results.ROAD_NAME
1                        KEBUN BARU LINK 1            174 ANG MO KIO AVENUE 4
2 223 ANG MO KIO AVENUE 1 SINGAPORE 560223            223 ANG MO KIO AVENUE 1
3 225 ANG MO KIO AVENUE 1 SINGAPORE 560225            225 ANG MO KIO AVENUE 1
4                          TECK GHEE VISTA           308A ANG MO KIO AVENUE 1
5                          TECK GHEE VISTA           308B ANG MO KIO AVENUE 1
6                      TECK GHEE EVERGREEN            314 ANG MO KIO AVENUE 3
7     STAR LEARNERS @ ANG MO KIO PTE. LTD.            320 ANG MO KIO AVENUE 1
8      NEIGHBOURHOOD POLICE POST TECK GHEE            321 ANG MO KIO AVENUE 1
9                           MY FIRST SKOOL            331 ANG MO KIO AVENUE 1
                      results.BUILDING
1                    KEBUN BARU LINK 1
2                                  NIL
3                                  NIL
4                      TECK GHEE VISTA
5                      TECK GHEE VISTA
6                  TECK GHEE EVERGREEN
7 STAR LEARNERS @ ANG MO KIO PTE. LTD.
8  NEIGHBOURHOOD POLICE POST TECK GHEE
9                       MY FIRST SKOOL
                                                                results.ADDRESS
1                    174 ANG MO KIO AVENUE 4 KEBUN BARU LINK 1 SINGAPORE 560174
2                                      223 ANG MO KIO AVENUE 1 SINGAPORE 560223
3                                      225 ANG MO KIO AVENUE 1 SINGAPORE 560225
4                     308A ANG MO KIO AVENUE 1 TECK GHEE VISTA SINGAPORE 561308
5                     308B ANG MO KIO AVENUE 1 TECK GHEE VISTA SINGAPORE 562308
6                  314 ANG MO KIO AVENUE 3 TECK GHEE EVERGREEN SINGAPORE 560314
7 320 ANG MO KIO AVENUE 1 STAR LEARNERS @ ANG MO KIO PTE. LTD. SINGAPORE 560320
8  321 ANG MO KIO AVENUE 1 NEIGHBOURHOOD POLICE POST TECK GHEE SINGAPORE 560321
9                       331 ANG MO KIO AVENUE 1 MY FIRST SKOOL SINGAPORE 560331
  results.POSTAL        results.X        results.Y results.LATITUDE
1         560174 28478.5794445509 39676.8076131288 1.37509746867904
2         560223 28534.6432265872 38676.1728609148 1.36604808445916
3         560225  28537.680043661 38825.2326317504 1.36739612776859
4         561308 29198.1455183588 38613.7381223254 1.36548342757106
5         562308 29248.5946499519 38589.7120627463  1.3652661423815
6         560314 29865.9980458226 38695.9702712912 1.36622707120636
7         560320  29676.907292584 38625.7954039071 1.36559244608528
8         560321 29712.2350843353 38575.9067748357 1.36514126911853
9         560331 29941.7457938343 38240.8809963278 1.36211140145298
  results.LONGITUDE               address
1   103.83761896123  174 ANG MO KIO AVE 4
2  103.838122716883  223 ANG MO KIO AVE 1
3  103.838150007464  225 ANG MO KIO AVE 1
4  103.844084739929 308A ANG MO KIO AVE 1
5  103.844538059044 308B ANG MO KIO AVE 1
6  103.850085858983  314 ANG MO KIO AVE 3
7  103.848386744168  320 ANG MO KIO AVE 1
8    103.8487041858  321 ANG MO KIO AVE 1
9   103.85076647513  331 ANG MO KIO AVE 1
not_found
data frame with 0 columns and 0 rows

In this example, we get all successful calls and 0 failures when performing geocoding.

Next, we tidy the field names.

colnames(found)
 [1] "results.SEARCHVAL" "results.BLK_NO"    "results.ROAD_NAME"
 [4] "results.BUILDING"  "results.ADDRESS"   "results.POSTAL"
 [7] "results.X"         "results.Y"         "results.LATITUDE"
[10] "results.LONGITUDE" "address"          
found_filtered <- found %>%
  select(results.BLK_NO, results.ROAD_NAME, results.POSTAL, results.X, results.Y, address) %>%
  rename(
    POSTAL = results.POSTAL,
    XCOORD = results.X,
    YCOORD = results.Y,
    BLK_NO = results.BLK_NO,
    ROAD_NAME = results.ROAD_NAME
  )
found_filtered 
  BLK_NO           ROAD_NAME POSTAL           XCOORD           YCOORD
1    174 ANG MO KIO AVENUE 4 560174 28478.5794445509 39676.8076131288
2    223 ANG MO KIO AVENUE 1 560223 28534.6432265872 38676.1728609148
3    225 ANG MO KIO AVENUE 1 560225  28537.680043661 38825.2326317504
4   308A ANG MO KIO AVENUE 1 561308 29198.1455183588 38613.7381223254
5   308B ANG MO KIO AVENUE 1 562308 29248.5946499519 38589.7120627463
6    314 ANG MO KIO AVENUE 3 560314 29865.9980458226 38695.9702712912
7    320 ANG MO KIO AVENUE 1 560320  29676.907292584 38625.7954039071
8    321 ANG MO KIO AVENUE 1 560321 29712.2350843353 38575.9067748357
9    331 ANG MO KIO AVENUE 1 560331 29941.7457938343 38240.8809963278
                address
1  174 ANG MO KIO AVE 4
2  223 ANG MO KIO AVE 1
3  225 ANG MO KIO AVE 1
4 308A ANG MO KIO AVE 1
5 308B ANG MO KIO AVE 1
6  314 ANG MO KIO AVE 3
7  320 ANG MO KIO AVE 1
8  321 ANG MO KIO AVE 1
9  331 ANG MO KIO AVE 1

Next, we join resale with found to form resale_geocoded. Then we convert this tibble dataframe to sf point feature data frame.

# we need to add an address column for joining
first_10_resale <- first_10_resale %>%
  mutate(address = paste(block, street_name))
resale_geocoded = left_join(
  first_10_resale, found_filtered, 
  by = c('address' = 'address'))

resale_geocoded
# A tibble: 10 × 19
   month   town       flat_type block street_name    storey_range floor_area_sqm
   <chr>   <chr>      <chr>     <chr> <chr>          <chr>                 <dbl>
 1 2024-09 ANG MO KIO 2 ROOM    314   ANG MO KIO AV… 01 TO 03                 44
 2 2024-09 ANG MO KIO 2 ROOM    174   ANG MO KIO AV… 10 TO 12                 45
 3 2024-09 ANG MO KIO 2 ROOM    174   ANG MO KIO AV… 04 TO 06                 45
 4 2024-09 ANG MO KIO 3 ROOM    223   ANG MO KIO AV… 10 TO 12                 82
 5 2024-09 ANG MO KIO 3 ROOM    320   ANG MO KIO AV… 04 TO 06                 73
 6 2024-09 ANG MO KIO 3 ROOM    331   ANG MO KIO AV… 01 TO 03                 68
 7 2024-09 ANG MO KIO 3 ROOM    308A  ANG MO KIO AV… 16 TO 18                 70
 8 2024-09 ANG MO KIO 3 ROOM    321   ANG MO KIO AV… 07 TO 09                 88
 9 2024-09 ANG MO KIO 3 ROOM    308B  ANG MO KIO AV… 22 TO 24                 70
10 2024-09 ANG MO KIO 3 ROOM    225   ANG MO KIO AV… 01 TO 03                 67
# ℹ 12 more variables: flat_model <chr>, lease_commence_date <dbl>,
#   remaining_lease <chr>, resale_price <dbl>, address <chr>,
#   remaining_lease_yr <int>, remaining_lease_mth <int>, BLK_NO <chr>,
#   ROAD_NAME <chr>, POSTAL <chr>, XCOORD <chr>, YCOORD <chr>

To convert to sf:

resale_geocoded_sf <- st_as_sf(resale_geocoded, 
                            coords = c("XCOORD",
                                       "YCOORD"),
                            crs=3414)

Next we check for overlapping point features.

overlapping_points <- resale_geocoded_sf %>%
  mutate(overlap = lengths(st_equals(., .)) > 1)

In the code below, st_jitter() of sf package is used to move the point features by 5m to avoid overlapping point features.

resale_geocoded_sf <- resale_geocoded_sf %>%
  st_jitter(amount = 5)

resale_geocoded_sf
Simple feature collection with 10 features and 17 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 28475.59 ymin: 38239.9 xmax: 29942.83 ymax: 39681.36
Projected CRS: SVY21 / Singapore TM
# A tibble: 10 × 18
   month   town       flat_type block street_name    storey_range floor_area_sqm
 * <chr>   <chr>      <chr>     <chr> <chr>          <chr>                 <dbl>
 1 2024-09 ANG MO KIO 2 ROOM    314   ANG MO KIO AV… 01 TO 03                 44
 2 2024-09 ANG MO KIO 2 ROOM    174   ANG MO KIO AV… 10 TO 12                 45
 3 2024-09 ANG MO KIO 2 ROOM    174   ANG MO KIO AV… 04 TO 06                 45
 4 2024-09 ANG MO KIO 3 ROOM    223   ANG MO KIO AV… 10 TO 12                 82
 5 2024-09 ANG MO KIO 3 ROOM    320   ANG MO KIO AV… 04 TO 06                 73
 6 2024-09 ANG MO KIO 3 ROOM    331   ANG MO KIO AV… 01 TO 03                 68
 7 2024-09 ANG MO KIO 3 ROOM    308A  ANG MO KIO AV… 16 TO 18                 70
 8 2024-09 ANG MO KIO 3 ROOM    321   ANG MO KIO AV… 07 TO 09                 88
 9 2024-09 ANG MO KIO 3 ROOM    308B  ANG MO KIO AV… 22 TO 24                 70
10 2024-09 ANG MO KIO 3 ROOM    225   ANG MO KIO AV… 01 TO 03                 67
# ℹ 11 more variables: flat_model <chr>, lease_commence_date <dbl>,
#   remaining_lease <chr>, resale_price <dbl>, address <chr>,
#   remaining_lease_yr <int>, remaining_lease_mth <int>, BLK_NO <chr>,
#   ROAD_NAME <chr>, POSTAL <chr>, geometry <POINT [m]>