nginx-cache-purge bash script
I'm testing nginx-cache-purge and notice that grep searches all file content, and since the cache key is in the second line I would like to limit grep searching:
The line I would like to change is:
find $2 - maxdepth 1 - type d | xargs - P $max_ pa ralle l - n 1 gre p - Rl " $1" | so rt - u
Seaching in google I found out that the command head -n can help me with this, but I can't get it to work.
Can someone help me with this?
Thanks
4 Replies
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match.
POSIX does not require this, though, so other implementations of grep may (although it would be dumb for them to do so) continue searching the entire file:
-l (The letter ell.) Write only the names of files containing
selected lines to standard output. Pathnames shall be written
once per file searched. If the standard input is searched, a
pathname of "(standard input)" shall be written, in the POSIX
locale. In other locales, "standard input" may be replaced by
something more appropriate in those locales.
Yes it stops when it finds a match, but we have two problems:
1. If you have a link inside the body that matches the url you want to remove from the cache, that page will be removed.
2. If we have a cache with 1000 files with 100 lines each and only 10 fies match our criteria, grep will need to search 99020 lines instead of 2000 if I'm not wrong.
In ths fisrt case I can use regular expressions to match the KEY word in the second line of the file but I can't stop grep from searching all the other files body.
Thanks
@nfn:
In ths fisrt case I can use regular expressions to
match the KEY word in the second line of the file but I can't stop grep from searching all the other files body.
you simply need to change the one liner into a for loop…
for s in `find $2 -maxdepth 1 -type d | xargs -P $max_parallel -n 1`
do
# $s = the filename so...
# head -2 $s | tail -1 = only the second line of info from the file
# test your grep against that
# if true, output the filename like: echo $s
done | sort -u
I don't see the point of using sort -u though since you are only going to see the filename once.
Test script that I tried this out (don't do bash scripting much)
#!/bin/bash
# Look at all the apache combine log files
# and output the files which have cox in the 2nd line
for i in `find /www/logs -name '*.combined'`
do
if head -2 $i | tail -1 | egrep -Eq "cox"
then
echo $i
fi
done
@nfn:
1. If you have a link inside the body that matches the url you want to remove from the cache, that page will be removed.
You can change the invocation of nginxcachepurge_item in the last line of the script to modify the regex. For example, if you know the item you are searching for is preceded by "foo" and followed by "bar" on the same line, you could change it to:
nginx_cache_purge_item "foo.*$1.*bar" $2
@nfn:
2. If we have a cache with 1000 files with 100 lines each and only 10 fies match our criteria, grep will need to search 99020 lines instead of 2000 if I'm not wrong.
I'm interpreting this to mean you have filenames in a specific format? For example, you want to search files whose names start with "cache-" but no others? In that case, you could modify the get-cache-files function like so:
function get_cache_files() {
local max_parallel=${3-16}
find $2 -type f -name "cache-*" | \
xargs -P $max_parallel -n 100 grep -l "$1" | sort -u
} # get_cache_files
This uses find to select the files to be scanned, rather than grep's recursive search.
Instead of kicking off 16 grep processes, each searching a particular directory, it starts up to 16 grep processes, each given a list of 100 filenames to check. If there are more than 1600 files, a new grep process will be started once one of the 16 completes. You may need to tweak the value of $max_parallel or the value of the xargs -n option to make it run faster. (I deleted the comments since the changes really invalidate them.)