Why is the hostname declared invalid when creating a URI

The bug is not in Java but in naming the host, since an underscore is not a valid character in a hostname. Although widely used incorrectly, Java refuses to handle such hostnames


Underscores are not supported in URIs.

While a hostname may not contain other characters, such as the underscore character (_), other DNS names may contain the underscore.[5][6] This restriction was lifted by RFC 2181, Section 11. Systems such as DomainKeys and service records use the underscore as a means to assure that their special character is not confused with hostnames. For example, _http._sctp.www.example.com specifies a service pointer for an SCTP-capable webserver host (www) in the domain example.com. Notwithstanding the standard, Chrome, Firefox, Internet Explorer, Edge and Safari allow underscores in hostnames, although cookies in IE do not work correctly if any part of the hostname contains an underscore character

Wikipedia

From Javadocs :

public URI(String str) throws URISyntaxException Throws: URISyntaxException - If the given string violates RFC 2396, as augmented by the above deviations

Javadocs

(Hacky) Solution :

    URI url = URI.create("https://5-12-145-35_s-8:8080");

    System.out.println(url.getHost()) // null

    if (url.getHost() == null) {
        final Field hostField = URI.class.getDeclaredField("host");
        hostField.setAccessible(true);
        hostField.set(url, "5-12-145-35_s-81");
    }
    System.out.println(url.getHost()); // 5-12-145-35_s-81

This was reported as - JDK bug


Host name must match the following syntax:

hostname      = domainlabel [ "." ] | 1*( domainlabel "." ) toplabel [ "." ]
domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

As you can see, only . and - are allowed, _ is not.


You then say that //5-12-145-35_s-81:443 is allowed, and it is, but not for host name.

To see how that pans out:

URI uriBadHost = URI.create("//5-12-145-35_s-81:443");
System.out.println("uri = " + uriBadHost);
System.out.println("  authority = " + uriBadHost.getAuthority());
System.out.println("  host = " + uriBadHost.getHost());
System.out.println("  port = " + uriBadHost.getPort());
URI uriGoodHost = URI.create("//example.com:443");
System.out.println("uri = " + uriGoodHost);
System.out.println("  authority = " + uriGoodHost.getAuthority());
System.out.println("  host = " + uriGoodHost.getHost());
System.out.println("  port = " + uriGoodHost.getPort());

Output

uri = //5-12-145-35_s-81:443
  authority = 5-12-145-35_s-81:443
  host = null
  port = -1
uri = //example.com:443
  authority = example.com:443
  host = example.com
  port = 443

As you can see, when the authority has a valid host name, the host and port are parsed, but when not valid, the authority is treated as freeform text, and not parsed any further.


UPDATE

From comment:

System.out.println( new URI(null, null, "/5-12-145-35_s-81", 443, null, null, null)) outputs: ///5-12-145-35_s-81:443. I'm giving it as hostname

The URI constructor you're calling is a convenience method, and it simple builds a full URI string and then parses that.

Passing "5-12-145-35_s-81", 443 becomes //5-12-145-35_s-81:443.
Passing "/5-12-145-35_s-81", 443 becomes ///5-12-145-35_s-81:443.

In the first, it's a host and port, and fails to parse.
In the second the authority part is empty, and /5-12-145-35_s-81:443 is a path.

URI uri1 = new URI(null, null, "/5-12-145-35_s-81", 443, null, null, null);
System.out.println("uri = " + uri1);
System.out.println("  authority = " + uri1.getAuthority());
System.out.println("  host = " + uri1.getHost());
System.out.println("  port = " + uri1.getPort());
System.out.println("  path = " + uri1.getPath());

Output

uri = ///5-12-145-35_s-81:443
  authority = null
  host = null
  port = -1
  path = /5-12-145-35_s-81:443

Tags:

Java