pacman::p_load(tidyverse, sf, tmap, httr, performance)
In-Class Exercise 10
In this exercise, we will recap on the use of geocoding and learn how to work with Open Government Data.
1 Exercise Reference
2 Learning Outcome
In this exercise, we will recap on the use of geocoding and learn how to work with Open Government Data.
3 Import the R Packages
4 Geocoding
The HDB resale data can be downloaded from here. The dataset contains resale flat prices based on registration date from Jan 2017 to Sep 2024.
The code below reads the raw CSV file containing the resale flat data and filters it to include only records from January 2023 to September 2024.
resale <- read_csv("data/raw_data/resale.csv") %>%
filter(month >= "2023-01" & month <= "2024-09")
The following code tidies the data by creating new columns: -
address: Combines block and
street_name to form a complete address. -
remaining_lease_yr: Extracts the remaining lease years
as an integer. - remaining_lease_mth: Extracts the
remaining lease months as an integer.
resale_tidy <- resale %>%
mutate(address = paste(block,street_name)) %>%
mutate(remaining_lease_yr = as.integer(
str_sub(remaining_lease, 0, 2)))%>%
mutate(remaining_lease_mth = as.integer(
str_sub(remaining_lease, 9, 11)))
Next, we filter the tidy dataset to include only records from September 2024 and take the first 10 addresses as an example.
Then, we generate a sorted list of unique addresses from the filtered dataset. This will be used to retrieve geographical coordinates.
[1] "174 ANG MO KIO AVE 4" "223 ANG MO KIO AVE 1" "225 ANG MO KIO AVE 1"
[4] "308A ANG MO KIO AVE 1" "308B ANG MO KIO AVE 1" "314 ANG MO KIO AVE 3"
[7] "320 ANG MO KIO AVE 1" "321 ANG MO KIO AVE 1" "331 ANG MO KIO AVE 1"
In the code below, we will perform geocoding and save the output
into 2 data frames, found and not_found.
found data frames contains data from successful API
calls and not_found contains data with api errors etc.
These data require additional care and manual geocoding may be
required.
url <- "https://onemap.gov.sg/api/common/elastic/search"
found <- data.frame()
not_found <- data.frame()
for (address in add_list){
query <- list('searchVal'=address, 'returnGeom'='Y',
'getAddrDetails'='Y', 'pageNum'='1')
res <- GET(url, query=query)
if ((content(res)$found)!=0){
tmp_df <- data.frame(content(res))[4:13]
tmp_df$address<- address
found <- rbind(found, tmp_df)
} else {
not_found <- rbind(not_found, data.frame(address = address))
}
}
found
results.SEARCHVAL results.BLK_NO results.ROAD_NAME
1 KEBUN BARU LINK 1 174 ANG MO KIO AVENUE 4
2 223 ANG MO KIO AVENUE 1 SINGAPORE 560223 223 ANG MO KIO AVENUE 1
3 225 ANG MO KIO AVENUE 1 SINGAPORE 560225 225 ANG MO KIO AVENUE 1
4 TECK GHEE VISTA 308A ANG MO KIO AVENUE 1
5 TECK GHEE VISTA 308B ANG MO KIO AVENUE 1
6 TECK GHEE EVERGREEN 314 ANG MO KIO AVENUE 3
7 STAR LEARNERS @ ANG MO KIO PTE. LTD. 320 ANG MO KIO AVENUE 1
8 NEIGHBOURHOOD POLICE POST TECK GHEE 321 ANG MO KIO AVENUE 1
9 MY FIRST SKOOL 331 ANG MO KIO AVENUE 1
results.BUILDING
1 KEBUN BARU LINK 1
2 NIL
3 NIL
4 TECK GHEE VISTA
5 TECK GHEE VISTA
6 TECK GHEE EVERGREEN
7 STAR LEARNERS @ ANG MO KIO PTE. LTD.
8 NEIGHBOURHOOD POLICE POST TECK GHEE
9 MY FIRST SKOOL
results.ADDRESS
1 174 ANG MO KIO AVENUE 4 KEBUN BARU LINK 1 SINGAPORE 560174
2 223 ANG MO KIO AVENUE 1 SINGAPORE 560223
3 225 ANG MO KIO AVENUE 1 SINGAPORE 560225
4 308A ANG MO KIO AVENUE 1 TECK GHEE VISTA SINGAPORE 561308
5 308B ANG MO KIO AVENUE 1 TECK GHEE VISTA SINGAPORE 562308
6 314 ANG MO KIO AVENUE 3 TECK GHEE EVERGREEN SINGAPORE 560314
7 320 ANG MO KIO AVENUE 1 STAR LEARNERS @ ANG MO KIO PTE. LTD. SINGAPORE 560320
8 321 ANG MO KIO AVENUE 1 NEIGHBOURHOOD POLICE POST TECK GHEE SINGAPORE 560321
9 331 ANG MO KIO AVENUE 1 MY FIRST SKOOL SINGAPORE 560331
results.POSTAL results.X results.Y results.LATITUDE
1 560174 28478.5794445509 39676.8076131288 1.37509746867904
2 560223 28534.6432265872 38676.1728609148 1.36604808445916
3 560225 28537.680043661 38825.2326317504 1.36739612776859
4 561308 29198.1455183588 38613.7381223254 1.36548342757106
5 562308 29248.5946499519 38589.7120627463 1.3652661423815
6 560314 29865.9980458226 38695.9702712912 1.36622707120636
7 560320 29676.907292584 38625.7954039071 1.36559244608528
8 560321 29712.2350843353 38575.9067748357 1.36514126911853
9 560331 29941.7457938343 38240.8809963278 1.36211140145298
results.LONGITUDE address
1 103.83761896123 174 ANG MO KIO AVE 4
2 103.838122716883 223 ANG MO KIO AVE 1
3 103.838150007464 225 ANG MO KIO AVE 1
4 103.844084739929 308A ANG MO KIO AVE 1
5 103.844538059044 308B ANG MO KIO AVE 1
6 103.850085858983 314 ANG MO KIO AVE 3
7 103.848386744168 320 ANG MO KIO AVE 1
8 103.8487041858 321 ANG MO KIO AVE 1
9 103.85076647513 331 ANG MO KIO AVE 1
not_found
data frame with 0 columns and 0 rows
In this example, we get all successful calls and 0 failures when performing geocoding.
Next, we tidy the field names.
colnames(found)
[1] "results.SEARCHVAL" "results.BLK_NO" "results.ROAD_NAME"
[4] "results.BUILDING" "results.ADDRESS" "results.POSTAL"
[7] "results.X" "results.Y" "results.LATITUDE"
[10] "results.LONGITUDE" "address"
found_filtered <- found %>%
select(results.BLK_NO, results.ROAD_NAME, results.POSTAL, results.X, results.Y, address) %>%
rename(
POSTAL = results.POSTAL,
XCOORD = results.X,
YCOORD = results.Y,
BLK_NO = results.BLK_NO,
ROAD_NAME = results.ROAD_NAME
)
found_filtered
BLK_NO ROAD_NAME POSTAL XCOORD YCOORD
1 174 ANG MO KIO AVENUE 4 560174 28478.5794445509 39676.8076131288
2 223 ANG MO KIO AVENUE 1 560223 28534.6432265872 38676.1728609148
3 225 ANG MO KIO AVENUE 1 560225 28537.680043661 38825.2326317504
4 308A ANG MO KIO AVENUE 1 561308 29198.1455183588 38613.7381223254
5 308B ANG MO KIO AVENUE 1 562308 29248.5946499519 38589.7120627463
6 314 ANG MO KIO AVENUE 3 560314 29865.9980458226 38695.9702712912
7 320 ANG MO KIO AVENUE 1 560320 29676.907292584 38625.7954039071
8 321 ANG MO KIO AVENUE 1 560321 29712.2350843353 38575.9067748357
9 331 ANG MO KIO AVENUE 1 560331 29941.7457938343 38240.8809963278
address
1 174 ANG MO KIO AVE 4
2 223 ANG MO KIO AVE 1
3 225 ANG MO KIO AVE 1
4 308A ANG MO KIO AVE 1
5 308B ANG MO KIO AVE 1
6 314 ANG MO KIO AVE 3
7 320 ANG MO KIO AVE 1
8 321 ANG MO KIO AVE 1
9 331 ANG MO KIO AVE 1
Next, we join resale with found to form
resale_geocoded. Then we convert this tibble dataframe
to sf point feature data frame.
# we need to add an address column for joining
first_10_resale <- first_10_resale %>%
mutate(address = paste(block, street_name))
resale_geocoded = left_join(
first_10_resale, found_filtered,
by = c('address' = 'address'))
resale_geocoded
# A tibble: 10 × 19
month town flat_type block street_name storey_range floor_area_sqm
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 2024-09 ANG MO KIO 2 ROOM 314 ANG MO KIO AV… 01 TO 03 44
2 2024-09 ANG MO KIO 2 ROOM 174 ANG MO KIO AV… 10 TO 12 45
3 2024-09 ANG MO KIO 2 ROOM 174 ANG MO KIO AV… 04 TO 06 45
4 2024-09 ANG MO KIO 3 ROOM 223 ANG MO KIO AV… 10 TO 12 82
5 2024-09 ANG MO KIO 3 ROOM 320 ANG MO KIO AV… 04 TO 06 73
6 2024-09 ANG MO KIO 3 ROOM 331 ANG MO KIO AV… 01 TO 03 68
7 2024-09 ANG MO KIO 3 ROOM 308A ANG MO KIO AV… 16 TO 18 70
8 2024-09 ANG MO KIO 3 ROOM 321 ANG MO KIO AV… 07 TO 09 88
9 2024-09 ANG MO KIO 3 ROOM 308B ANG MO KIO AV… 22 TO 24 70
10 2024-09 ANG MO KIO 3 ROOM 225 ANG MO KIO AV… 01 TO 03 67
# ℹ 12 more variables: flat_model <chr>, lease_commence_date <dbl>,
# remaining_lease <chr>, resale_price <dbl>, address <chr>,
# remaining_lease_yr <int>, remaining_lease_mth <int>, BLK_NO <chr>,
# ROAD_NAME <chr>, POSTAL <chr>, XCOORD <chr>, YCOORD <chr>
To convert to sf:
resale_geocoded_sf <- st_as_sf(resale_geocoded,
coords = c("XCOORD",
"YCOORD"),
crs=3414)
Next we check for overlapping point features.
overlapping_points <- resale_geocoded_sf %>%
mutate(overlap = lengths(st_equals(., .)) > 1)
In the code below, st_jitter() of sf package is used to
move the point features by 5m to avoid overlapping point features.
resale_geocoded_sf <- resale_geocoded_sf %>%
st_jitter(amount = 5)
resale_geocoded_sf
Simple feature collection with 10 features and 17 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 28475.59 ymin: 38239.9 xmax: 29942.83 ymax: 39681.36
Projected CRS: SVY21 / Singapore TM
# A tibble: 10 × 18
month town flat_type block street_name storey_range floor_area_sqm
* <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 2024-09 ANG MO KIO 2 ROOM 314 ANG MO KIO AV… 01 TO 03 44
2 2024-09 ANG MO KIO 2 ROOM 174 ANG MO KIO AV… 10 TO 12 45
3 2024-09 ANG MO KIO 2 ROOM 174 ANG MO KIO AV… 04 TO 06 45
4 2024-09 ANG MO KIO 3 ROOM 223 ANG MO KIO AV… 10 TO 12 82
5 2024-09 ANG MO KIO 3 ROOM 320 ANG MO KIO AV… 04 TO 06 73
6 2024-09 ANG MO KIO 3 ROOM 331 ANG MO KIO AV… 01 TO 03 68
7 2024-09 ANG MO KIO 3 ROOM 308A ANG MO KIO AV… 16 TO 18 70
8 2024-09 ANG MO KIO 3 ROOM 321 ANG MO KIO AV… 07 TO 09 88
9 2024-09 ANG MO KIO 3 ROOM 308B ANG MO KIO AV… 22 TO 24 70
10 2024-09 ANG MO KIO 3 ROOM 225 ANG MO KIO AV… 01 TO 03 67
# ℹ 11 more variables: flat_model <chr>, lease_commence_date <dbl>,
# remaining_lease <chr>, resale_price <dbl>, address <chr>,
# remaining_lease_yr <int>, remaining_lease_mth <int>, BLK_NO <chr>,
# ROAD_NAME <chr>, POSTAL <chr>, geometry <POINT [m]>