4096 server performs much worse than a 2048?
To my surprise, the 2048 performed signficantly better in terms of latency: average response times were about 60% higher on the 4096 Linode, and there was much higher variation. The 4096 performed better in terms of throughput since it could run more Passenger workers, but I'm more concerned about latency. The parts of the Rails application I'm benchmarking are mainly CPU-bound, so I'm guessing this is due to the noisy neighbor problem (i.e the server the 4096 is on has a lot more tenants using the CPU than the 2048). To check, I ran "sysbench –test=cpu" on both Linodes, and the results seem to confirm my suspicions:
4096 Linode:
[root@web masonm]# sysbench --test=cpu --cpu-max-prime=100000 --num-threads=2 run
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 2
Doing CPU performance benchmark
Threads started!
Done.
Maximum prime number checked in CPU test: 100000
Test execution summary:
total time: 266.7814s
total number of events: 10000
total time taken by event execution: 533.4837
per-request statistics:
min: 33.73ms
avg: 53.35ms
max: 202.86ms
approx. 95 percentile: 89.52ms
Threads fairness:
events (avg/stddev): 5000.0000/1.00
execution time (avg/stddev): 266.7419/0.01
2048 Linode:
[root@web shared]# sysbench --test=cpu --cpu-max-prime=100000 --num-threads=2 run
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 2
Doing CPU performance benchmark
Threads started!
Done.
Maximum prime number checked in CPU test: 100000
Test execution summary:
total time: 141.9363s
total number of events: 10000
total time taken by event execution: 283.8482
per-request statistics:
min: 27.67ms
avg: 28.38ms
max: 32.22ms
approx. 95 percentile: 29.39ms
Threads fairness:
events (avg/stddev): 5000.0000/0.00
execution time (avg/stddev): 141.9241/0.00
Is there something else that can explain this? The next thing I'm going to try is switching datacenters, but it feels like I'm missing something.
EDIT: I got the plans wrong in my original message: the server I said was a 8192 is actually a 4096, and the 4096 is actually a 2048. Sorry for any confusion!
7 Replies
@Main Street James:
Have you asked Linode support about this?
No. I figured I'd ask here in case I was doing something stupid so I wouldn't bother support unnecessarily.
@masonm:
sysbench –test=cpu --cpu-max-prime=100000 --num-threads=2 run
Perhaps the difference due to the fact that your benchmark only uses 2 threads. Likewise, your Rails app probably uses only one thread per request.
A lot of things could be different between the host that houses your 2GB Linode and the one that houses your 4GB Linode. One of the possibilities is that the 4GB host has a larger number of slower CPUs.
This would slow down single-threaded CPU-bound apps, but the total amount of CPU that is shared among the tenants would be similar, and there would be half as many tenants on average. (The fact that you only see 8 cores in both cases is irrelevant because you're never supposed to max out all the cores.)
Linode has gone through many generations of servers, so I wouldn't be surprised if this were the case. And of course there could be noisy neighbors as you said.
cat /proc/cpuinfo
.
To see if noisy neighbors are limiting your CPU resources, run something CPU-intensive, open top
or htop
, and see if the "st" (steal) percentage is high.
@mnordhoff:
As hybinet brought up, the two servers may have different model CPUs. You can check that yourself with
cat /proc/cpuinfo
.
Yes, I meant to include that in my original post. The 4096 has a Xeon E5-2670, while the 2048 has a Xeon E5-2680 v2. From some Googling, it looks like the E5-2680 is a bit faster, but not nearly enough to account for the differences I'm seeing.
> To see if noisy neighbors are limiting your CPU resources, run something CPU-intensive, open
top
or htop
, and see if the "st" (steal) percentage is high.
Cool, I didn't know about the steal percentage metric. I ran sysbench again while monitoring the steal percentage on both hosts. On the 2048 it never went above 1%, while on the 4096 it fluctuated widely from ~6% to ~70%. That pretty much cinches it. I'm going to file a support ticket to have the 4096 moved to California and hope I have better neighbors this time.
Thanks for your help hybinet and mnordhoff!
Benchmarks seem to indicate a ~10% performance improvement from the newer architecture, then you've got an ~8% improvement in clockspeed, and a 25% improvement in core count. Overall, that should produce ~48% performance improvement. That would seem to reflect a good chunk of the difference you're seeing.