MRTG and I/O Limiter

I wrote a script to monitor /proc/io_status (the I/O limiter) that plugs into MRTG.

Script and sample MRTG config

My MRTG graphs, including I/O limiter stuff

12 Replies

I checked out your script and installed mrtg just to try it. It's pretty cool, thanks for posting.

One question. When I use IO lots, sometimes my token count will go negative, that is, it will be a value less than zero. Does mrtg not like this? It seems to not record negative values, or values above max_bytes. There are blank spots in the graph, and the minimum recorded value field contains the old results before the count went negative. If this is a case of mrtg rejecting negative values, is there a way to override?

I didn't think that token values could go negative. I'm not denying that it happens; I'm just saying that a negative token value doesn't make sense to me.

Anyhow, in that case, the script I wrote doesn't handle negative integers. I posted an updated version that does, though, so now you can see if MRTG also has problems with negative numbers.

As for maxbytes, that's its purpose. Any values above maxbytes get ignored as garbage. If you're getting values above max_bytes, you need to increase it.

The sample config (and the script, for that matter) is just something I threw together in five minutes for my mostly-idle Linode 64. It's possible that a larger Linode gets a larger bucket, so max_bytes for the tokens has to be increased.

Would anyone with a larger Linode care to chime in? Take a look at /proc/iostatus and post what your tokenmax is, along with the size of your Linode.

iotokens will go negative, indicating that you're being limited. It should never go more negative than tokenrefill.

All of the values should be the same, regardless of Linode size. This is something I've debated making different for each Linode plan – but the real point of the limiter isn't to provide better/worse performance for each plan, it's to protect a single Linode from eating up all the I/O on the host.

-Chris

I think I understand now.

I looked it up, and MRTG can't graph negative values. I threw a quick test in the script to check for negative io_tokens values and output 0 instead. Anyone who cares should probably get the latest version.

I changed the script slightly to report "tokens consumed" instead of tokens available. This solves the negative number problem for me. It parses tokenmax and subtracts iotokens and reports this.

So when io_tokens<0 it reports >400K consumed tokens. Changes:

# Two matches in case the order changes someday
$tokendata =~ /io_count=(\d+)/;
$io_count = $1;
$tokendata =~ /io_tokens=(-?\d+)/;
$io_tokens = $1;
$tokendata =~ /token_max=(\d+)/;
$token_max = $1;

$tokens_consumed = $token_max - $io_tokens;

printf("%d\n%d\n", $io_count, $tokens_consumed);

Nice. Don't forget to increase maxbytes to tokenmax+token_refill.

Does anyone else have a copy of these MRTG scripts? The original posters domain is down and none of the links in this thread are working right now.

I'm specifically going to adapt the scripts to work on my system under Munin. Having never done MRTG stuff before, I assume it's reasonably straightforward …

Thanks!

@bji:

Does anyone else have a copy of these MRTG scripts? The original posters domain is down and none of the links in this thread are working right now.
They seem to be working for me …

-Chris

@caker:

@bji:

Does anyone else have a copy of these MRTG scripts? The original posters domain is down and none of the links in this thread are working right now.
They seem to be working for me …

-Chris

You were right, the site became accessible to me later that day. Must have been a transient network problem somewhere out there on the 'net.

Anyway, I wrote a couple of very simple munin scripts to track I/O token usage. They're very small so I'm putting them "in-line" here:

bji$ cat /etc/munin/plugins/io_tokens_rate 
#!/bin/sh

if [ "$1" = "config" ]; then
    echo "graph_title I/O token consumption rate"
    echo "graph_vlabel tokens per second"
    echo "io_tokens_rate.label rate"
    echo "io_tokens_refill.label refill"
    exit 0
fi

IO_STATUS=`cat /proc/io_status`

# IO_RATE is the rate at which tokens are being used
IO_RATE=`echo "$IO_STATUS" | awk '{ print $2; }' | cut -d '=' -f 2`

# TOKENS_REFILL is the refill rate - when IO_RATE is higher than this, we're
# using tokens at a rate faster than they are being refilled, and our
# io_count will start to decrease
TOKENS_REFILL=`echo "$IO_STATUS" | awk '{ print $4; }' | cut -d '=' -f 2`

echo "io_tokens_rate.value $IO_RATE"
echo "io_tokens_refill.value $TOKENS_REFILL"
bji$ cat /etc/munin/plugins/io_tokens_pct  
#!/bin/sh

if [ "$1" = "config" ]; then
    echo "graph_title I/O token usage (in %)"
    echo "graph_vlabel %"
    echo "graph_args --base 1000 --rigid --lower-limit 0 --upper-limit 101"
    echo "graph_scale no"
    echo "io_tokens_pct.label rate"
    echo "io_tokens_pct.min 0"
    echo "io_tokens_pct.max 101"
    echo "io_tokens_pct.warning :80"
    echo "io_tokens_pct.critical :100"
    exit 0
fi

IO_STATUS=`cat /proc/io_status`

# ((TOKEN_MAX - IO_TOKENS) / TOKEN_MAX) is the "token usage percentage"
# When this gets to 100%, limiting occurs
IO_TOKENS=`echo "$IO_STATUS" | awk '{ print $3; }' | cut -d '=' -f 2`
TOKEN_MAX=`echo "$IO_STATUS" | awk '{ print $5; }' | cut -d '=' -f 2`
TOKEN_DEBT_PCT=$[((TOKEN_MAX - IO_TOKENS) * 100) / TOKEN_MAX]

echo "io_tokens_pct.value $TOKEN_DEBT_PCT"

These scripts are probably not the most efficient since they fire up two programs (awk and cut) just to extract one value from the /proc/io_status line. But I'm not the best awk/sed/etc programmer in the world and I just did something simple. If anyone would like to submit a more efficient version of these scripts, that would be most welcome.

Anyway, the first script graphs the I/O token consumption rate as the actual iorate value directly from /proc/iostatus. This value ranges from 0 to somewhere in the 1000's as far as I can tell. I'm not sure what the upper bound is but munin/rrdtools auto scaling does a good job of keeping the graph in line. The script also graphs a line indicating the tokenrefill rate, so that you can easily tell when your iorate has gone above the refill rate (and thus you are losing tokens).

The second script graphs the I/O tokens currently consumed as a percentage. Basically, iotokens and tokenmax are used to determine how may of your available tokens you have currently consumed. When this value gets to 100%, your Linode will start being I/O limited and will be I/O limited until this value drops below 100%. 101% is the maximum possible value here (the reason is a little complicated - but elsewhere in this thread is an explanation of why this is true).

@bji:

These scripts are probably not the most efficient since they fire up two programs (awk and cut) just to extract one value from the /proc/io_status line. But I'm not the best awk/sed/etc programmer in the world and I just did something simple. If anyone would like to submit a more efficient version of these scripts, that would be most welcome.

Well, since you asked :-). Once you've fired up awk, you might as well do all the calcs in it:

awk '{split($3,iotok,"=");
          split($5,tokmax,"=");
          tdp=((tokmax[2]-iotok[2])*100)/tokmax[2];
          print "io_tokens_pct.value " tdp}'  < /proc/io_status

(The split() function puts entries into arrays, which is why later we use 'tokmax[2]' and 'iotok[2]'.) Note that I've not actually tried this on a linode; tweak as desired.

Actually, it's probably not a significant efficiency issues, as you're not going to be running the script once a second, right? I mostly posted this to remind people that awk is a fairly useful mini-language, which I forget from time to time.

@bji:

(quoted text omitted)

Anyway, I wrote a couple of very simple munin scripts to track I/O token usage. They're very small so I'm putting them "in-line" here:

(quoted code and commentary omitted)

I have adapted bji's iotokenrate and iotokenpct munin scripts, with the aid of SteveG's awk example, to create a comprehensive iostatus munin script. For those who are familiar with warewolf's website http://ratemylinode.com/, it produces a graph similar to his token usage graphs. The script is available at http://movealong.org/~inkblot/iostatus, and included here:

#!/bin/sh

if [ "$1" = "config" ]; then
cat <<eof graph_title="" linode="" i="" o="" limiter="" status="" graph_vlabel="" tokens="" or="" tps="" graph_category="" graph_args="" --logarithmic="" token_count.label="" count="" token_count.draw="" area="" io_rate.label="" rate="" io_rate.draw="" line2="" token_refill.label="" refill="" token_refill.draw="" token_max.label="" max="" token_max.draw="" eof="" fi="" awk="" '{="" \="" split($2,="" iorate,="" "="); \
    split($3, tokencount, " =");="" split($4,="" tokenrefill,="" print="" "token_count.value="" tokencount[2]i;="" if="" (iorate[2]="">= 0) print "io_rate.value " iorate[2]; \
    else print "io_rate.value 0"; \
    print "token_refill.value " tokenrefill[2]i; \
    print "token_max.value " tokenmax[2]i; \
}' < /proc/io_status</eof> 

To use, simply copy this script into /etc/munin/plugins and restart munin-node.

I've fixed my bugs. Here's the end result, and probably final revision:

#!/bin/sh

if [ "$1" = "config" ]; then
cat <<eof 0="" graph_title="" linode="" i="" o="" limiter="" status="" graph_vlabel="" tokens="" or="" \${graph_period}="" graph_category="" graph_args="" --logarithmic="" token_count.label="" count="" token_count.draw="" area="" token_count.info="" number="" of="" accumulated="" io_rate.label="" rate="" io_rate.draw="" line2="" io_rate.type="" derive="" io_rate.info="" token="" usage="" token_refill.label="" refill="" token_refill.draw="" token_refill.type="" absolute="" token_refill.info="" replacement="" token_max.label="" max="" token_max.draw="" token_max.info="" maximum="" eof="" exit="" fi="" awk="" '{="" \="" split($1,="" iocount,="" "="); \
    split($3, tokencount, " =");="" split($4,="" tokenrefill,="" tokenrefill[2]="" *="300;" if="" (tokencount[2]="">= 0) print "token_count.value " tokencount[2]i; \
    else print "token_count.value 0"; \
    print "io_rate.value " iocount[2]; \
    print "token_refill.value " tokenrefill[2]i; \
    print "token_max.value " tokenmax[2]i; \
}' < /proc/io_status</eof> 

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct