nginx-cache-purge bash script

Hi,

I'm testing nginx-cache-purge and notice that grep searches all file content, and since the cache key is in the second line I would like to limit grep searching:

https://github.com/perusio/nginx-cache-purge

The line I would like to change is:

find $2 - maxdepth 1 - type d | xargs - P $max_ pa ralle l - n 1 gre p - Rl " $1" | so rt - u

Seaching in google I found out that the command head -n can help me with this, but I can't get it to work.

Can someone help me with this?

Thanks

4 Replies

The script already uses the -l option to grep. For GNU grep, this stops after the first match:

-l, --files-with-matches
       Suppress normal output; instead print the  name  of  each  input
       file  from  which  output would normally have been printed.  The
       scanning will stop on the first match.

POSIX does not require this, though, so other implementations of grep may (although it would be dumb for them to do so) continue searching the entire file:

-l     (The  letter  ell.)  Write  only  the  names of files containing
       selected lines to standard output. Pathnames  shall  be  written
       once  per  file  searched.  If the standard input is searched, a
       pathname of "(standard input)" shall be written,  in  the  POSIX
       locale.  In  other  locales, "standard input" may be replaced by
       something more appropriate in those locales.

Hi'

Yes it stops when it finds a match, but we have two problems:

1. If you have a link inside the body that matches the url you want to remove from the cache, that page will be removed.

2. If we have a cache with 1000 files with 100 lines each and only 10 fies match our criteria, grep will need to search 99020 lines instead of 2000 if I'm not wrong.

In ths fisrt case I can use regular expressions to match the KEY word in the second line of the file but I can't stop grep from searching all the other files body.

Thanks

I don't use nginx and I have not see a cache file from it, but based on your statement…
@nfn:

In ths fisrt case I can use regular expressions to match the KEY word in the second line of the file but I can't stop grep from searching all the other files body.
you simply need to change the one liner into a for loop…

for s in `find $2 -maxdepth 1 -type d | xargs -P $max_parallel -n 1`
do 
  # $s = the filename so...
  # head -2 $s | tail -1 = only the second line of info from the file
  # test your grep against that
  # if true, output the filename like:  echo $s
done | sort -u

I don't see the point of using sort -u though since you are only going to see the filename once.

Test script that I tried this out (don't do bash scripting much)

#!/bin/bash
# Look at all the apache combine log files
# and output the files which have cox in the 2nd line
for i in `find /www/logs -name '*.combined'`
do
   if head -2 $i | tail -1 | egrep -Eq "cox"
   then
       echo $i
   fi
done

@nfn:

1. If you have a link inside the body that matches the url you want to remove from the cache, that page will be removed.

You can change the invocation of nginxcachepurge_item in the last line of the script to modify the regex. For example, if you know the item you are searching for is preceded by "foo" and followed by "bar" on the same line, you could change it to:

nginx_cache_purge_item "foo.*$1.*bar" $2

@nfn:

2. If we have a cache with 1000 files with 100 lines each and only 10 fies match our criteria, grep will need to search 99020 lines instead of 2000 if I'm not wrong.

I'm interpreting this to mean you have filenames in a specific format? For example, you want to search files whose names start with "cache-" but no others? In that case, you could modify the get-cache-files function like so:

function get_cache_files() {
    local max_parallel=${3-16}
    find $2 -type f -name "cache-*" | \
        xargs -P $max_parallel -n 100 grep -l "$1" | sort -u
} # get_cache_files

This uses find to select the files to be scanned, rather than grep's recursive search.

Instead of kicking off 16 grep processes, each searching a particular directory, it starts up to 16 grep processes, each given a list of 100 filenames to check. If there are more than 1600 files, a new grep process will be started once one of the 16 completes. You may need to tweak the value of $max_parallel or the value of the xargs -n option to make it run faster. (I deleted the comments since the changes really invalidate them.)

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct