Using urltools::url_parse with UTF-8 domains

I could reproduce the issue. I could convert the column domain to UTF-8 by reading it with readr::parse_character and latin1 encoding:

library(urltools)
library(tidyverse)

url <- "www.cordes-tiefkühlprodukte.de"

parts <- 
  url_parse(url) %>% 
  mutate(domain = parse_character(domain, locale = locale(encoding = "latin1")))

parts

  scheme                         domain port path parameter fragment
1   <NA> www.cordes-tiefkühlprodukte.de <NA> <NA>      <NA>     <NA>

I guess that the encoding you have to specify (here latin1) depends only on your locale and not on the url's special characters, but I'm not 100% sure about that.

Tags:

R

Url Parsing