Using wget to recursively fetch an image directory

This might be irrelevant but since there are lots of geeks here, I will just try my luck.
What I am trying to do is simple: using wget to fetch a directory in a website and its sub-directories. For example, headline.nycweb.io is a Joomla website, and under its document root, there is an "image" directory containing lots of images. I want to use wget to fetch the whole "image" directory and its contents to my own server.

I have read this SO post: https://stackoverflow.com/questions/273743/using-wget-to-recursively-fetch-a-directory-with-arbitrary-files-in-it/273776#273776
but when I tried
wget --recursive --no-parent -e robots=off http://headline.nycweb.io/images/
I am always getting an index.html file. So what did I do wrong? And is what I am trying to do possible at all?

By the way, I have total control over both the source website and the destination server.

1 Reply

Never tried to do this with wget before, but I thought I'd take a look to try and get the ball rolling.

I did a little surfing and for a second I thought you might want to try adding a --reject "index.html*" to your wget before the download URL, but upon further review it looks like this would just exclude index.html from the other files that are meant to be here.

Maybe it's something with Apache that's preventing access to the files? I'm getting 403 Forbidden when trying to access, for instance, headline.nycweb.io/images/Demo but not when I go directly to http://headline.nycweb.io/images/Demo/blog/business9.jpg

$ curl -ILl headline.nycweb.io/images/Demo
HTTP/1.1 301 Moved Permanently
Date: Wed, 03 Jul 2019 19:51:27 GMT
Server: Apache/2.4.18 (Ubuntu)
Location: http://headline.nycweb.io/images/Demo/
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 403 Forbidden
Date: Wed, 03 Jul 2019 19:51:27 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Type: text/html; charset=iso-8859-1
$ curl -ILl headline.nycweb.io/images/Demo/blog/business9.jpg
HTTP/1.1 200 OK
Date: Wed, 03 Jul 2019 19:52:15 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Thu, 29 Sep 2016 12:45:30 GMT
ETag: "520e-53da4da4d8e80"
Accept-Ranges: bytes
Content-Length: 21006
Content-Type: image/jpeg

So maybe wget is bailing when it tries to read the subdirectories under images? Might be worth looking at the permissions on those dirs.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct