no clues in access log / grab images
I suspect that images from my web pages are grabbed in some irregular way for post-processing and use in another pages.
In the access log, I see a regular access to grab those images but I can not see the method (log attached below). I wonder what system they may be using.
I suspect of a former client, so I will just invoice them and if that does not work I will ban their IP. Any legal advice there if they refuse payment?
> 89.31.97.111 - - [17/Apr/2010:10:17:01 +0000] "GET /snow/figures/snowmap72h.png HTTP/1.0" 200 492177 "-" "-"
89.31.97.111 - - [17/Apr/2010:10:18:02 +0000] "HEAD /snow/figures/snowmap48h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [17/Apr/2010:10:18:02 +0000] "GET /snow/figures/snowmap48h.png HTTP/1.0" 200 491843 "-" "-"
89.31.97.111 - - [18/Apr/2010:06:10:02 +0000] "HEAD /snow/figures/snowmap72h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [18/Apr/2010:06:10:03 +0000] "GET /snow/figures/snowmap72h.png HTTP/1.0" 200 496274 "-" "-"
89.31.97.111 - - [18/Apr/2010:06:16:02 +0000] "HEAD /snow/figures/snowmap48h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [18/Apr/2010:06:16:02 +0000] "GET /snow/figures/snowmap48h.png HTTP/1.0" 200 484551 "-" "-"
89.31.97.111 - - [18/Apr/2010:06:19:02 +0000] "HEAD /snow/figures/snowmap24h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [18/Apr/2010:06:19:02 +0000] "GET /snow/figures/snowmap24h.png HTTP/1.0" 200 491152 "-" "-"
89.31.97.111 - - [18/Apr/2010:10:11:02 +0000] "HEAD /snow/figures/snowmap24h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [18/Apr/2010:10:11:02 +0000] "GET /snow/figures/snowmap24h.png HTTP/1.0" 200 491152 "-" "-"
89.31.97.111 - - [18/Apr/2010:10:14:02 +0000] "HEAD /snow/figures/snowmap48h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [18/Apr/2010:10:14:07 +0000] "GET /snow/figures/snowmap48h.png HTTP/1.0" 200 484551 "-" "-"
89.31.97.111 - - [18/Apr/2010:10:17:01 +0000] "HEAD /snow/figures/snowmap72h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [18/Apr/2010:10:17:01 +0000] "GET /snow/figures/snowmap72h.png HTTP/1.0" 200 496274 "-" "-"
89.31.97.111 - - [18/Apr/2010:10:18:01 +0000] "HEAD /snow/figures/snowmap48h.png HTTP/1.1" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
89.31.97.111 - - [18/Apr/2010:10:18:01 +0000] "GET /snow/figures/snowmap48h.png HTTP/1.0" 200 484551 "-" "-"
The images are daily snow maps that I am producing from meteoexploration
Thanks a lot for any clues.
9 Replies
You have NO evidence yet you're willing to invoice a client and then take them to court if they don't pay.
I'm guessing you used to work for the RIAA or MPAA - right?
Just set up rewrite rules to block remote image access also known as hot-linking.
Thank you for your effort in replying to my query.
The question about legal advice is just a little addition at the end, if that hurt your sensibilities I am quite happy to remove it. That is the reason I didn't elaborate on the additional evidence I have, which is quite substantial.
My main concern is to know which system are they using to grab the images so that I can ban their IPs. Since the access log shows a "GET" to the correct images but no method I was wondering how they grab them.
Cheers
P.S.
@obs:
Just set up rewrite rules to block remote image access also known as hot-linking.
It is not hot-linking, they manipulate the image and mask the copyright notice with their own logo. What I was charging them was only a fraction of the cost of producing the maps, including server costs, which makes the whole thing rather miserable on their side
@patagon:
My main concern is to know which system are they using to grab the images so that I can ban their IPs. Since the access log shows a "GET" to the correct images but no method I was wondering how they grab it.
"GET" is the method - unless you mean some other use of that term than in the HTTP protocol.
If you are referring to the fact that you don't see any agent information or other data such as referral data that just means the client used to issue the request didn't bother to include those headers. That's pretty trivial to do with most tools (wget, curl, etc..) or libraries, and such headers aren't officially required.
– David
That's the ip it's coming from it's in the netherlands.
Put this as a rewrite rule
RewriteCond %{HTTP_REFERER} !^http://(www.)?yourdomain.com/.*$ [NC]
RewriteRule .(gif|jpg|png)$ - [F]
It'll blog all referrers that don't come from your site including direct requests to all gif jpg and png files.
Silly me, I was assuming that curl and wget would leave a bigger "fingerprint"
Things clearer now.
@obs:
RewriteCond %{HTTP_REFERER} !^http://(www.)?yourdomain.com/.*$ [NC]
RewriteRule .(gif|jpg|png)$ - [F]
Thanks obs. I am happy to allow people to use up to three of my forecasts in their web pages, blogs etc. Many are doing so, and that rule would block all of them. The IP and corresponding whois info is another bit of evidence (they are not very clever)
@patagon:
Silly me, I was assuming that curl and wget would leave a bigger "fingerprint"
They typically will by default (e.g., wget by default uses an agent string like wget/), but changing the agent is just a command line option away. E.g., using -U "" with wget will stop the inclusion of the agent header.
– David
Put a small "warning caption" on each photo (i.e. this photo contains a digital watermark with my copyright info), which links to a page detailing what a digital watermark is, what copyright is, and what you intend to do to people that steal your work.