Can I stop wget creating duplicates?

I suggest you use the -N option.

-N
--timestamping
    Turn on time-stamping.

It enables time-stamping, which re-downloads the file only if its newer on the server than the downloaded version.

$ wget -N https://cdn.sstatic.net/askubuntu/img/logo.png
...
Saving to: ‘logo.png’
...

$ wget -N https://cdn.sstatic.net/askubuntu/img/logo.png
...
Server file no newer than local file ‘logo.png’ -- not retrieving.

Caveat (from αғsнιη's comment)

If the server is not configured properly, it may always report that the file is new and -N will always re-download the file. In this case, -nc is probably a better option.


Yes it's -c option.

--continue
    Continue getting a partially-downloaded file.  This is useful when you want to
    finish up a download started by a previous instance of Wget, or by another
    program.

If the file is the same, the second download attempt will stop.

$ wget -c https://cdn.sstatic.net/askubuntu/img/logo.png
...
Saving to: ‘logo.png’
...

$ wget -c https://cdn.sstatic.net/askubuntu/img/logo.png
...
The file is already fully retrieved; nothing to do.

Caveats (from jofel's comments)

If the file has changed on the server, the -c option can give incorrect results.

With -c, wget simply asks the server for any data beyond the part of the already downloaded file, nothing else. It does not check if there was any change in the part of the file that is already downloaded. Thus, you could a corrupted file which is a mixture of the old and new file.


Local test

You can test it by running simple local web-server as following(Thanks to @roadmr's answer):

Open a Terminal windows and type:

cd /path/to/parent-download-dir/
python -m SimpleHTTPServer

Now open another Terminal and do:

wget -c http://localhost:8000/filename-to-download

Note that filename-to-download is the file that located in /path/to/parent-download-dir/ which we want to download it.

Now if you run wget command for multiple times you will see:

The file is already fully retrieved; nothing to do.

Ok,now go to /path/to/parent-download-dir/ directory and add something to the source file, for example if it is a text file, add a simple extra line in it and save the file. Now try with wget -c ... . Great, now you will see the file re-downloads again but you already have downloaded it before.

Reason: why re-downloading?

because its size changed to larger size than old downloaded file and nothing else.


Also there is another option called -nc for wgetting:

--no-clobber
   If a file is downloaded more than once in the same directory, Wget's behavior
   depends on a few options, including -nc.  In certain cases, the local file will
   be clobbered, or overwritten, upon repeated download.  In other cases it will be
   preserved.

When -nc option is specified, Wget will refuse to download copies of the same file. If you had the same file that wget tries to download, it will refuse to download it unless you rename or remove the local file.

$ wget -nc https://cdn.sstatic.net/askubuntu/img/logo.png
...
Saving to: ‘logo.png’
...

$ wget -nc https://cdn.sstatic.net/askubuntu/img/logo.png
File ‘logo.png’ already there; not retrieving.

Sometimes this option is strongly good and I recommended to use -nc option instead of both -c or -N option because these options will overwrite the download-file with your local file if they had same names.

Caveat (from jofel's comment)

The -nc option does not update the file if it has changed on the server. If you know the file will change, the -N option is preferable. If you know the file will not change (or you don't care) then -nc is ok.

Tags:

Wget

Duplicate