Wednesday, July 1, 2020

recursive extract from an Apache presented file structure via wget

when one finds a directory on the web which has such as

<parent dir>
directory
file1
file2

etc., it's desirable to extract the entire structure, recursing into the directory structure, duplicating the structure in some fashion locally

This example shows how to produce a local structure


https://stackoverflow.com/questions/23446635/how-to-download-http-directory-with-all-files-and-sub-directories-as-they-appear

wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
Explanation:
  • It will download all files and subfolders in ddd directory
  • -r : recursively
  • -np : not going to upper directories, like ccc/…
  • -nH : not saving files to hostname folder
  • --cut-dirs=3 : but saving it to ddd by omitting first 3 folders aaa, bbb, ccc
  • -R index.html : excluding index.html files
Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/

wget – recursively download all files from certain directory listed by apache

Case: recursively download all the files that are in the ‘ddd’  folder for the url ‘http://hostname/aaa/bbb/ccc/ddd/&#8217;
Solution:
wget -r -np -nH –cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
Explanation:
It will download all files and subfolders in ddd directory:
recursively (-r),
not going to upper directories, like ccc/… (-np),
not saving files to hostname folder (-nH),
but to ddd by omitting first 3 folders aaa, bbb, ccc (–cut-dirs=3),
excluding index.html files (-R index.html)

******************************
windows visual version
https://sites.google.com/site/visualwget/a-download-manager-gui-based-on-wget-for-windows


--30--