regexp for finding everything between <a> and </a> tags

The standard disclaimer applies: Parsing HTML with regular expressions is not ideal. Success depends on the well-formedness of the input on a character-by-character level. If you cannot guarantee this, the regex will fail to do the Right Thing at some point.

Having said that:

<a\b[^>]*>(.*?)</a>   // match group one will contain the link text

I'm a big fan of regexes, but this is not the right place to use them.

Use a real HTML parser.

  • Your code will be clearer
  • It will be more likely to work

I Googled for a PHP HTML parser, and found this one.

If you know you're working with XHTML, then you could use PHP's standard XML parser.


<a\s*(.*)\>(.*)</a>

<a href="http://www.stackoverflow.com">Go to stackoverflow.com</a>

$1 = href="www.stackoverflow.com"

$2 = Go to stackoverflow.com

I answered a similar question to strip everything except a tags here

Tags:

Php

Regex