Convert string data into data frame

You can use scan with a little gsub:

matrix(scan(text = gsub("[()]", "", coordinates), sep = ","), 
       ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Lat", "Long")))
# Read 12 items
#            Lat     Long
# [1,] -79.43592 43.68015
# [2,] -79.43492 43.68037
# [3,] -79.43395 43.68058
# [4,] -79.43388 43.68059
# [5,] -79.43282 43.68081
# [6,] -79.43270 43.68080

The precision is still there--just truncated in the matrix display.

Two clear advantages:

  • Fast.
  • Handles multi-element "coordinates" vector (eg: coordinates <- rep(coordinates, 10) as an input).

Here's another option:

library(data.table)
fread(gsub("[()]", "", gsub("), (", "\n", toString(coordinates), fixed = TRUE)), header = FALSE)

The toString(coordinates) is for cases when length(coordinates) > 1. You could also use fread(text = gsub(...), ...) and skip using toString. I'm not sure of the advantages or limitations of either approach.


We can use str_extract_all from stringr

library(stringr)

df <- data.frame(Latitude = str_extract_all(coordinates, "(?<=\\()-\\d+\\.\\d+")[[1]], 
      Longitude = str_extract_all(coordinates, "(?<=,\\s)\\d+\\.\\d+(?=\\))")[[1]])
df
#            Latitude          Longitude
#1 -79.43591570873059  43.68015339477487
#2 -79.43491506339724  43.68036886994886
#3 -79.43394727223847 43.680578504490335
#4 -79.43388162422195  43.68058996121469
#5 -79.43281544978878 43.680808044458765
#6  -79.4326971769691  43.68079658822322

Latitude captures the negative decimal number from opening round brackets (() whereas Longitude captures it from comma (,) to closing round brackets ()).

Or without regex lookahead and behind and capturing it together using str_match_all

df <- data.frame(str_match_all(coordinates, 
                        "\\((-\\d+\\.\\d+),\\s(\\d+\\.\\d+)\\)")[[1]][, c(2, 3)])

To convert data into their respective types, you could use type.convert

df <- type.convert(df)

Here is a base R option:

coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)"
coordinates <- gsub("^\\(|\\)$", "", coordinates)
x <- strsplit(coordinates, "\\), \\(")[[1]]
df <- data.frame(lat=sub(",.*$", "", x), lng=sub("^.*, ", "", x), stringsAsFactors=FALSE)
df

The strategy here is to first strip the leading trailing parentheses, then string split on \), \( to generate a single character vector with each latitude/longitude pair. Finally, we generate a data frame output.

                 lat                lng
1 -79.43591570873059  43.68015339477487
2 -79.43491506339724  43.68036886994886
3 -79.43394727223847 43.680578504490335
4 -79.43388162422195  43.68058996121469
5 -79.43281544978878 43.680808044458765
6  -79.4326971769691 43.68079658822322

Tags:

String

Regex

R