Why would I use Wget instead of a browser?

Typically you would never use it "instead of a browser". Browsers render HTML, make links clickable (as opposed to having to copy the URL into another wget command manually), etc. There's literally no upside to using wget as a human. If you are concerned about privacy, there's a million ways to clean a browser up (or you could use a less featureful browser, like Lynx if you really wanna get barebones without destroying all semblance of human user interface).

Wget is primarily used when you want a quick, cheap, scriptable/command-line way of downloading files. So, for example, you can put wget in a script to download a web page that gets updated with new data frequently, which is something a browser can't really be used for. You can use wget's various options to crawl and automatically save a website, which most browsers can't do, at least not without extensions.

In short, browsers are applications for humans looking at the internet, wget is a tool for machines and power users moving data over HTTP. Very similar in what they do (pull files from websites) but entirely different in their use.

Regarding what servers "see" when you get things with wget: all HTTP clients (browsers, wget, curl, other similar applications) transmit what's called a "User Agent", which is just a string that describes the browser (or these days, describes what browser features it has). This can be used to show different content depending on the user's browser (i.e. Google tries not to advertise Chrome to people already using Chrome). Some fools try to block power user shenanigans by blocking wget's user agent string, but you can just fake a Chrome user agent string to get around that. More often it's simply used for statistics so you know how popular different browsers are so you know which ones to test with the most thoroughly.

If you use wget's crawling functions, the server will see many rapid requests in a mostly alphabetical order. It's a dead giveaway that you're scraping their site. It looks entirely different from the browsing of a user. With a human user making requests in a browser, every page request is followed by all the images on that page, and then there's some delay, and then there's a request for another random page (or possibly a string of pages with a clear purpose).

As others have mentioned, wget has the benefit of not being bundled with add-ons, cookies, and cache, which makes it potentially more stable and secure. But browsers and wget actually have very different normal uses.

wget is a command-line utility meant to retrieve content, not to present it. It can be used to retrieve and download anything through FTP, HTTP, and HTTPS, including any file types (HTML, images, binaries, etc).

For the server, the only difference it will see is a different user agent unless you use its --user-agent argument to specify a browser's. If you do, the server won't see any difference.

1) downloads initiated by a script rather than a human being

2) downloading whole sites (or fragments of sites) rather than separate pages. (Wget can automatically follow links.)

Wget has some command-line options to control what the server sees and can think, including arbitrary delays between requests to download pages. But if the site server has some anti-bot policy, you often waste a large amount of time and traffic before you get an acceptable result.

Why would I use Wget instead of a browser?

Tags:

Wget

Downloads

Internet

Related

Recent Posts