Python BeautifulSoup Extract specific URLs

You can match multiple aspects, including using a regular expression for the attribute value:

import re
soup.find_all('a', href=re.compile('http://www\.iwashere\.com/'))

which matches (for your example):

[<a href="">next</a>, <a href="">next</a>]

so any <a> tag with a href attribute that has a value that starts with the string

You can loop over the results and pick out just the href attribute:

>>> for elem in soup.find_all('a', href=re.compile('http://www\.iwashere\.com/')):
...     print elem['href']

To match all relative paths instead, use a negative look-ahead assertion that tests if the value does not start with a schem (e.g. http: or mailto:), or a double slash (//hostname/path); any such value must be a relative path instead:

soup.find_all('a', href=re.compile(r'^(?!(?:[a-zA-Z][a-zA-Z0-9+.-]*:|//))'))

If you're using BeautifulSoup 4.0.0 or greater:'a[href^=""]')