How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?

Solution:

wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

Explanation:

  • It will download all files and subfolders in ddd directory
  • -r : recursively
  • -np : not going to upper directories, like ccc/…
  • -nH : not saving files to hostname folder
  • --cut-dirs=3 : but saving it to ddd by omitting first 3 folders aaa, bbb, ccc
  • -R index.html : excluding index.html files

Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/


I was able to get this to work thanks to this post utilizing VisualWGet. It worked great for me. The important part seems to be to check the -recursive flag (see image).

Also found that the -no-parent flag is important, othewise it will try to download everything.

enter image description here enter image description here


wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

From man wget

‘-r’ ‘--recursive’ Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.

‘-np’ ‘--no-parent’ Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.

‘-nH’ ‘--no-host-directories’ Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.

‘--cut-dirs=number’ Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.

Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs’ comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs’ option works.

No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> .

--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd’ and ‘-P’. However, unlike ‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1’, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.