Format of cookies when using wget?

The format is Netscape format as stated in the man page and this format is:

The layout of Netscape's cookies.txt file is such that each line contains one name-value pair. An example cookies.txt file may have an entry that looks like this:

.netscape.com TRUE / FALSE 946684799 NETSCAPE_ID 100103

Each line represents a single piece of stored information. A tab is inserted between each of the fields.

From left-to-right, here is what each field represents:

domain - The domain that created AND that can read the variable.

flag - A TRUE/FALSE value indicating if all machines within a given domain can access the variable. This value is set automatically by the browser, depending on the value you set for domain.

path - The path within the domain that the variable is valid for.

secure - A TRUE/FALSE value indicating if a secure connection with the domain is needed to access the variable.

expiration - The UNIX time that the variable will expire on. UNIX time is defined as the number of seconds since Jan 1, 1970 00:00:00 GMT.

name - The name of the variable.

value - The value of the variable.

(From "The Unofficial Cookie FAQ", edited for clarity)


One way of getting cookies for wget is to use the --keep-session-cookies options of wget.

For example :

wget --keep-session-cookies --save-cookies cookies.txt "http://MYSITE/?__login=USER&__password=PASS"

The ?__login etc depends on the web site you're trying to mirror, you might have to look at how the authentication form works.

Then you can use :

wget --mirror --load-cookies cookies.txt http://MYSITE/

The Netscape cookies file format for each data line is as above, but you won't be able to read it in with HTTP::Cookies::Netscape unless it has a header line like this, which the complete file format requires:

# Netscape HTTP Cookie File

or this:

# HTTP Cookie File

Tags:

Wget