My Block Storage Volume becomes unresponsive occasionally. What could be causing this?
The main symptom is that the volume becomes unresponsive. I can't even run ls
on it. Today the condition has continued for 20 hours, so it doesn't appear to be something I can wait out.
So far the only solution I have found is to reboot the linode.
To be fair, I am not 100% sure that this is a Volumes issue. It could be some kind of weird btrfs bug (although commands like btrfs fi show
and btrfs fi usage /backup
continue to work).
The main symptoms are unresponsive volume which I can't umount and a high iowait% in top (consistently 23-25% regardless of actual activity on the server). When I run iostat -x --human 1
I see zero activity on sdc (the volume in question).
I am only using about 33GB out of 100. There is plenty of space (and it is truly unallocated).
I am using luks, so unfortunately that's another variable.
1 Reply
The best way forward here is to try and run some diagnostic tests on your Linode. Some of my favorites are listed below:
System monitoring tools
lsof
lsof
will tell you what processes are currently accessing any directory you specify, and can be run on the mount point of your Block Storage Volume to see which processes are currently accessing it. Here's what it looks like when I examine an example directory (/backup/
) while running a fio
test on that directory:
you@yourserver$ lsof /backup/
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 20518 root cwd DIR 8,32 4096 2 /backup
fio 21495 root cwd DIR 8,32 4096 2 /backup
fio 21497 root cwd DIR 8,32 4096 2 /backup
fio 21497 root 3u REG 8,32 4294967296 12 /backup/test
If you run this test while your Volume is unresponsive, it could identify processes that are locking up your system.
iostat
This command will show you the CPU usage and input/output statistics for your system. It's useful for identifying if your CPU is waiting on disk I/O, as well as other CPU states. It will also tell you statistics about your disk read and write operations. You can run it multiple times in sequence (iostat 1 10
will iterate iostat
10 times), which can give you a good picture of your system. Here's the output of iostat
run while doing a lot of write operations. Notice that the first output looks a little different; your first iostat
report will show you your overall system stats since your last boot. The following reports will show you snapshots of the moment that they run.
you@yourserver$ iostat 1 5
avg-cpu: %user %nice %system %iowait %steal %idle
1.85 0.00 1.25 0.02 0.00 96.88
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 1.68 0.36 24.31 1937557 132473748
sdb 0.01 0.03 0.10 169824 525096
sdc 0.05 6.02 4.35 32772882 23723928
avg-cpu: %user %nice %system %iowait %steal %idle
4.21 0.00 5.26 90.53 0.00 0.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 582.11 0.00 501221.05 0 476160
avg-cpu: %user %nice %system %iowait %steal %idle
6.12 0.00 7.14 86.73 0.00 0.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 660.20 0.00 574693.88 0 563200
avg-cpu: %user %nice %system %iowait %steal %idle
5.32 0.00 3.19 91.49 0.00 0.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 418.09 0.00 359489.36 0 337920
avg-cpu: %user %nice %system %iowait %steal %idle
5.26 0.00 5.26 89.47 0.00 0.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 718.95 0.00 630012.63 0 598512
This can indicate how much I/O wait you are dealing with, and whether that wait coinsides with lots of disk activity on your Volume (like it does in that output above). It also can identify whether CPU Steal is affecting your performance.
ps
Running ps
in a loop allows you to poll your system periodically for processes stuck in the "D" state, which is a state of uninterruptible sleep. Processes in the "D" state are waiting on system resources (usually either CPU threads or read/write cycles on your disk) before they can execute. Unlike regular sleeping processes, processes in uninterruptable sleep cannot be terminated normally and the only way to deal with them is either reboot, or wait for your resources to free up so they can run. While waiting, they can still block I/O operations, and can be impacting your system even during times of apparently low activity.
You can check for "D" state processes with this command. Every 5 seconds, it runs ps
to look for processes which are in "D" state (waiting on IO) and prints the name and process ID of those processes:
you@yourserver$ for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "-"; sleep 2; done
-
D 2064 [process name]
-
D 2064 [process name]
-
D 2064 [process name]
-
D 2064 [process name]
-
This can indicate any processes that may be potentially causing issues on your Volume. You can read more about D state processes here:
What is a "D" state process and how to check it?
Performance benchmarking with fio
Its also useful to see what kind of disk write speeds you are getting. These tests are best run when your system is no longer unresponsive (so you may have to reboot).
A good tool for the jobs is fio. It is designed to be a highly customizable I/O testing tool. It can be optimized to simulate a specific workload, with the aim of replicating your application's disk reads and writes as closely as possible. Directions on using fio are in the fio documentation, but here's some test cases that can get you started:
Performance benchmark for lots of small I/O operations (block size of 4Kb):
# Random write test
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4
# Random Read test
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread --ramp_time=4
Performance benchmark for large file transfers (block size of 4Mb):
# Sequential write test
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=write --ramp_time=4
# Sequential Read test
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=4G --readwrite=read --ramp_time=4
fio
will output the average, minimum, and maximum read or write speeds, as well as a ton of other statistics. You can expect to see benchmarks in this range:
Using fio
, you can get an idea of how your Volume is performing, relative to its expected behavior. If you see any unusual fio
output, feel free to reach out to Linode Support to ensure that your Volume is working as expected.
Known Issues with btrfs
While I am no btrfs expert, a bit of research revealed that at least on some kernels, btrfs can be very slow when you are using a lot of btrfs subvolumes. This only applies if you are using the 4.14.x, 4.9.x, or 4.4.x kernel, so this may not apply here. If you want to reach out to the btrfs community itself, you can do so through the btrfs Mailing List or perhaps even submit a btrfs Bug Report. If any Linode Community members have any other insights about btrfs, please weigh in.