Split & extract part of string (between a "." and digit) in R

You can delimit your string using regex and then split that strings for getting your results:

delimitedString = gsub( "^([0-9]+). (.*) ([0-9.]+)$", "\\1,\\2,\\3", companies  )

do.call( 'rbind', strsplit(split = ",", x = delimitedString) )
#      [,1]  [,2]                   [,3]  
#[1,] "612" "Grt. Am. Mgt. & Inv." "7.33"
#[2,] "77"  "Wickes"               "4.61"
#[3,] "265" "Wang Labs"            "8.75"
#[4,] "9"   "CrossLand Savings"    "6.32"
#[5,] "228" "JPS Textile Group"    "2.00" 

Regex explanation:

  • ^[0-9]+ : any pattern composed by numbers from 0 to 9 at the beginning (i.e. ^) of your string
  • .* : greedy match, basically anything surrounded by two spaces on the above case
  • [0-9.]+$: again numbers + a point and at the ending (i.e. $) of your string

Parenthesis are used to indicate that I want to catch those part of string which are fitted by regex. Upon caught them, those substring are collapsed and delimited by commas. Finally, we can split the whole string with strsplit function and bind rows with do.call function

Tags:

Regex

R

Stringr