How to visit a link inside an email using capybara

You can use Nokogiri to parse the email body and find the link you want to click.

Imagine you want to click a link Change my password:

email = ActionMailer::Base.deliveries.last
html = Nokogiri::HTML(email.html_part.body.to_s)
target_url = html.at("a:contains('Change my password')")['href']
visit target_url

I think this is more semantic and robust that using regular expressions. For example, this would work if the email has many links.


If you're using or willing to use the capybara-email gem, there's now a simpler way of doing this. Let's say you've generated an email to [email protected], which contains the link 'fancy link'.

Then you can just do this in your test suite:

open_email('[email protected]') # Allows the current_email method
current_email.click_link 'fancy link'

In your test, use whatever means you need in order to trigger the sending of the email by your application. Once the email is sent, use a regular expression to find the URL from the link within the email body (note this will work only for an email that contains a single link), and then visit the path from that URL with Capybara to continue with your test:

path_regex = /(?:"https?\:\/\/.*?)(\/.*?)(?:")/    

email = ActionMailer::Base.deliveries.last
path = email.body.match(path_regex)[1]
visit(path)


Regular expression explained

A regular expression (regex) itself is demarcated by forward slashes, and this regex in particular consists of three groups, each demarcated by pairs of parentheses. The first and third groups both begin with ?:, indicating that they are non-capturing groups, while the second is a capturing group (no ?:). I will explain the significance of this distinction below.

The first group, (?:"https?\:\/\/.*?), is a:

  • non-capturing group, ?:
  • that matches a single double quote, "
    • we match a quote since we anticipate the URL to be in the href="..." attribute of a link tag
  • followed by the string http
  • optionally followed by a lowercase s, s?
    • the question mark makes the preceding match, in this case s, optional
  • followed by a colon and two forward slashes, \:\/\/
    • note the backslashes, which are used to escape characters that otherwise have a special meaning in a regex
  • followed by a wildcard, .*?, which will match any character any number of times up until the next match in the regex is reached
    • the period, or wildcard, matches any character
    • the asterisk, *, repeats the preceding match up to an unlimited number of times, depending on the successive match that follows
    • the question mark makes this a lazy match, meaning the wildcard will match as few characters as possible while still allowing the next match in the regex to be satisfied

The second group, (\/.*?) is a capturing group that:

  • matches a single forward slash, \/
    • this will match the first forward slash after the host portion of the URL (e.g. the slash at the end of http://www.example.com/) since the slashes in http:// were already matched by the first group
  • followed by another lazy wildcard, .*?

The third group, (?:"), is:

  • another non-capturing group, ?:
  • that matches a single double quote, "

And thus, our second group will match the portion of the URL starting with the forward slash after the host and going up to, but not including, the double quote at the end of our href="...".

When we call the match method using our regex, it returns an instance of MatchData, which behaves much like an array. The element at index 0 is a string containing the entire matched string (from all of the groups in the regex), while elements at subsequent indices contain only the portions of the string matched by the regex's capturing groups (only our second group, in this case). Thus, to get the corresponding match of our second group—which is the path we want to visit using Capybara—we grab the element at index 1.