How to get the base domain name from an URL using PHP?

top-level domains and second-level domains may be 2 characters long but a registered subdomain must be at least 3 characters long.

EDIT: because of pjv's comment, i learned Australian domain names are an exception because they allow 5 TLDs as SLDs (com,net,org,asn,id) example: somedomain.com.au. i'm guessing com.au is nationally controlled domain name which "shares". so, technically, "com.au" would still be the "base domain", but that's not useful.

EDIT: there are 47,952 possible three-letter domain names (pattern: [a-zA-Z0-9][a-zA-Z0-9-][a-zA-Z0-9] or 36 * 37 * 36) combined with just 8 of the most common TLDS (com,org,etc) we have 383,616 possibilities -- without even adding in the entire scope of TLDs. 1-letter and 2-letter domain names still exist, but are not valid going forward.

in google.com -- "google" is a subdomain of "com"

in google.co.uk -- "google" is a subdomain of "co", which in turn is a subdomain of "uk", or a second-level domain really, since "co" is also a valid top-level domain

in www.google.com -- "www" is a subdomain of "google" which is a subdomain of "com"

"co.uk" is NOT a valid host because there is no valid domain name

going with that assumption this function will return the proper "basedomain" in almost all cases, without requiring a "url map".

if you happen to be one of the rare cases, perhaps you can modify this to fulfill particular needs...

EDIT: you must pass the domain string as a URL with it's protocol (http://, ftp://, etc) or parse_url() will not consider it a valid URL (unless you want to modify the code to behave differently)

function basedomain( $str = '' )
{
    // $str must be passed WITH protocol. ex: http://domain.com
    $url = @parse_url( $str );
    if ( empty( $url['host'] ) ) return;
    $parts = explode( '.', $url['host'] );
    $slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
    return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}

if you need to be accurate use fopen or curl to open this URL: http://data.iana.org/TLD/tlds-alpha-by-domain.txt

then read the lines into an array and use that to compare the domain parts

EDIT: to allow for Australian domains:

function au_basedomain( $str = '' )
{
    // $str must be passed WITH protocol. ex: http://domain.com
    $url = @parse_url( $str );
    if ( empty( $url['host'] ) ) return;
    $parts = explode( '.', $url['host'] );
    $slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
    if ( preg_match( '/\.(com|net|asn|org|id)\.au$/i', $url['host'] ) ) $slice = 3;
    return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}

IMPORTANT ADDITIONAL NOTES: I don't use this function to validate domains. It is generic code I only use to extract the base domain for the server it is running on from the global $_SERVER['SERVER_NAME'] for use within various internal scripts. Considering I have only ever worked on sites within the US, I have never encountered the Australian variants that pjv asked about. It is handy for internal use, but it is a long way from a complete domain validation process. If you are trying to use it in such a way, I recommend not to because of too many possibilities to match invalid domains.


You could do this:

$urlData = parse_url($url);

$host = $urlData['host'];

** Update **

The best way I can think of is to have a mapping of all the TLDs that you want to handle, since certain TLDs can be tricky (co.uk).

// you can add more to it if you want
$urlMap = array('com', 'co.uk');

$host = "";
$url = "http://www.google.co.uk";

$urlData = parse_url($url);
$hostData = explode('.', $urlData['host']);
$hostData = array_reverse($hostData);

if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
  $host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
} elseif(array_search($hostData[0], $urlMap) !== FALSE) {
  $host = $hostData[1] . '.' . $hostData[0];
}

echo $host;

Try using: http://php.net/manual/en/function.parse-url.php. Something like this should work:

$urlParts = parse_url($yourUrl);
$hostParts = explode('.', $urlParts['host']);
$hostParts = array_reverse($hostParts);
$host = $hostParts[1] . '.' . $hostParts[0];

Tags:

Php