How do I get a URL over HTTP with netcat?

The headers in an HTTP request must use CRLF (Windows) line endings. (See Wikipedia or RFC 2616.) Many servers support LF (Unix) line endings as an extension, but not this one.

In addition, HTTP 1.1 requires a Host: header line, as Warren Young pointed out. (See Wikipedia or RFC 2616).

echo -e "GET http://www.yellowpages.com.eg/Mjg3NF9VUkxfMTEwX2h0dHA6Ly93d3cubG90dXMtYWlyLmNvbV8=/Lotus-Air/profile.html HTTP/1.1\r\nHost: www.yellowpages.com.eg\r\n\r\n" | nc www.yellowpages.com 80

or more legibly

sed $'s/$/\r/' <<EOF | nc www.yellowpages.com 80
GET http://www.yellowpages.com.eg/Mjg3NF9VUkxfMTEwX2h0dHA6Ly93d3cubG90dXMtYWlyLmNvbV8=/Lotus-Air/profile.html HTTP/1.1
Host: www.yellowpages.com.eg

EOF

But why not use wget or curl, which will construct a valid request without sweating and still allow you to specify custom headers if necessary?


You need to include the domain name in your GET request. You have told nc the domain name you are connecting to do it knows where to go find the server, but nc doesn't pass that on to the server. If the server is hosting multiple domains, it will not know which one to send you. The request header you are passing with echo should include this full domain like this:

echo "GET http://domain.tld/path" | nc domain.tld 80

Note that you can also drop the -e argument to your echo and the escaped newlines at the end. The -e is suppressing echo's natural tendency to add a newline, then you are adding one yourself.

Edit 1: Is there some reason you are not using a normal download tool like curl that can handle all the header possibilities and give you useful output? Do you really need to handle the header chat yourself? curl http://domain.tld/path should give you much more reliable output because the programmers have already worked through all the possibilities for you.

Edit 2: See Warren's answer for information about the protocol specification. TL;DR: If you specify 1.1, you have to then comply with that protocal. If you pecify 1.0, you can usually make the reqest as above.

The make a requestion with echo and netcat using HTTP/1.1, try this:

echo "GET http://domain.tld/path HTTP/1.1\nHost: domain.tld\n" | nc domain.tld 80

HTTP 1.1 requires that you send at least a Host header in GET requests. That is, the minimum legal request looks like this:

GET http://www.example.com/noise/and/junk HTTP/1.1
Host: www.example.com

(Plus an additional CRLF to terminate the header section, of course.)

There may be HTTP servers that will cope with a request claiming to require HTTP 1.1 but that doesn't include a Host header, but your server is correct to reject such a request.

Host is an HTTP 1.1 extension which was needed to support name-based virtual hosting. If the site you're trying to access has dedicated servers (or at least, a dedicated IP), you can safely drop back to HTTP 1.0, which lets you make a single-line HTTP request:

GET http://www.example.com/noise/and/junk HTTP/1.0

Tags:

Http

Nc