New processors at Linode?

I noticed that a new linode I set up today uses a quad Intel Xeon L5520. Slightly older Linode hosts use the L5420.

Some quick comparisons

Processor L5420 L5520

Clock 2.5Ghz 2.27Ghz

Cache 6Mb 8Mb

Hyperthead No Yes

I have checked dmesg, but there is no sign of hyperthreading support, or 8 CPUs (4 cores + hyperthreading).

Is there still a security issue with Hyperthreading? Has it been disabled? Or does dom0 have access to hyperthreading which is hidden from the user kernels?

I have performed a quick and dirty time dd if=/dev/urandom of=/dev/null bs=1K count=1000 test to determine relative single-threaded speed. (this ignores many important issues such as memory bandwidth etc, but is a quick and easy comparative test of raw CPU muscle). Hyperthreading would probably not improve the score on this test.

The L5420 generates 6.15Mb/sec of pseudorandom data compared to 5.75Mb/sec for the L5520 in a Linode. This is an average, with a very small deviation.

So the L5420 has a 10% higher clock speed, and can run a single thread, in this test, 7% faster than the L5520 in a Linode.

I'd expect the hosts to get pretty much the CPU they want as they are just about idle before the test.

I'm not suggesting that the L5520 should have HT enabled if it poses a security vulnerability, but without HT, would it be fair to expect the L5520 to underperform an L5420, or are there other tests and considerations which make the L5520 a better choice? … or have Intel made a retrograde step … at least for our needs?

15 Replies

Virtual machines have historically had some problems with hyperthreading and performance, but those may have been worked around with scheduler tweaks (VMware has done so, I believe; I dunno about Xen). caker & crew will probably need to do some testing before they go and enable it on new hardware.

That aside, the 5520 itself is probably always going to give slightly lower or roughly equal raw single-threaded CPU performance, but you're likely to get better scaling and multitasking, with or without HT enabled, thanks to QPI and the integrated memory controllers.

ETA: Forgot to mention the changes to the cache. It hasn't just been increased 2MB. Each CPU now has its own separate 256KB L2 cache in addition to the shared 8MB L3 cache. This is also likely to help with multitasking.

The increased per-CPU cache would likely reduce the time a processing core is idle, waiting for un-cached data, but so will hyperthreading. Hyperthreading technology uses one physical processing core but with two sets of registers. Processing can continue on a second thread, with the second set of registers, whilst the first thread is waiting for un-cached data.

The degree to which these architecture improvements help real-life situations depends on too many factors to give a definite answer, but field tests have shown general rules of thumb.

In general, with multithreaded applications, and a slow-ish memory subsystem (as original P4s), hyperthreading will give about a 30% improvement. With a very fast memory subsystem and a more effective cache system, this may be closer to 20 or 25%.

The additional cache may explain why my tests showed a 7% rather than a 10% difference on these hosts.

Given the 5520 starts 7% behind the 5420 without hyperthreading, enabling hyperthreading will likely result in a total performance of the 5520 exceeding the 5420 by 16.25% in a multi-threaded environment.

An interesting point, and possible bug here is the Linode statistics. If the total machine performance is 16.25% higher, and the number of CPUs seen by the scheduler and statistics system is 8, and not 4, the statistics system will give inconsistent results.

The stats would likely give results as though there were 8x1.45Ghz processors on each machine, instead of the 4x2.5Ghz processors.

So each Linode would appear to use 72% more of a host CPU on a L5520 (with HT enabled) than it would on an L5420, even though it were using the same proportion of the available processing power. So if stats are to stand comparison between 5420 hosts and 5520HT hosts, they would likely need calibration.

Theoretically, yes hyperthreading can increase performance. In practice, on multi-physical-CPU boxes, hyperthreading-ignorant algorithms often result in physical CPUs being entirely idle or unnecessarily loaded with two threads where threads < vcores-1 but threads >= pcores. The result is an effective performance decrease.

The question is whether Xen has adapted its scheduling algorithm accordingly. If so, great. If not, you don't want HT enabled, it will only hurt.

Good point. Imagine two threads with affinity to the two virtual (HT) processors associated with one core with other 3 cores idle.

It appears the cache layout is rather more complex on the 5520. The 5520 actually appears to have less cache than the 5420. The bulk cache (8mb) on the 5520 is L3, whereas the cache on the 5420 is arranged as 2x6Mb (12Mb). It muddies the waters somewhat.

http://processorfinder.intel.com/Detail … Spec=SLBBR">http://processorfinder.intel.com/Details.aspx?sSpec=SLBBR

http://processorfinder.intel.com/detail … Spec=SLBFA">http://processorfinder.intel.com/details.aspx?sSpec=SLBFA

Until I know the stats system on Linode ignores HT virtual cores, and I know the scheduler can effectively manage multiple processors with HT enabled, i'd prefer the older L5420 based hosts.

It is interesting to consider that Intel released HT in '04-'05. They omitted Hyperthreading from processors released in 06, 07 and early 08.

Hyperthreading increases core size by around 5%. If the technology works, then it is well worth the silicon real estate to deploy.

According to sources such as:

http://news.zdnet.co.uk/hardware/0,1000 … 341,00.htm">http://news.zdnet.co.uk/hardware/0,1000000091,39237341,00.htm

server admins have, as a matter of course disabled hyperthreading as it has often hurt rather than helped performance.

I therefore question Intel's wisdom in releasing their newest processors with HT whilst requiring HT to reach performance of the predecessor.

From my reading, I understand that HT is discredited and dis-proven in servers. Perhaps Intel's newest implementation works where previous implementations of HT did not. However, I feel it is very much the responsibility of Intel to ensure and prove that their current implementation of HT will work in their customer's appliances.

I wonder if it will be possible to convince Intel that they ought to be working with Xen to ensure their current re-release of HT technology will work efficiently with XEN virtualised servers. In other words, with the help of Intel, Xen can release a new Dom0 kernel, based on the latest linux kernel, which properly manages Hyperthreading and can give a guaranteed performance boost.

After all, the server processor market is very competitive, and I am sure AMD will be delighted to pick up the market share of those customers who cannot or do not want to use Intel's HT technology.

On tests with gzip, the 2.27Ghz L5520 performs around 4-6% better than the 2.5Ghz L5420. The ratio holds whether single or multi-threaded.

time cat input | gzip >/dev/null

time cat input | gzip | gunzip | gzip >/dev/null

These tests are a very small window on the range of demands a CPU may serve.

I believe the Nehalem L5520 has some strengths, but might not set the world alight. The L5520 without HT is perhaps a peer with the L5420. A little less than the 5420 for bit shifting, but much faster for floating point.

I don't know if it is purely down to the CPU, but I have also noticed that benchmarks are a bit lower on a new L5520-based linode.

If I recall correctly, each host system has 8 cores, with each linode having access to 4 of them. What I was wondering is does a single L5520 count as 8 cores (with the hyperthreading) or are there 2xL5520 processors, ie. does each linode get 4 real or virtual cores?

Btw, are all new linodes (in Newark) being deployed on L5520 systems?

> If I recall correctly, each host system has 8 cores, with each linode having access to 4 of them.

Where do you recall that from? Who said it?

@Nick Hill:

> If I recall correctly, each host system has 8 cores, with each linode having access to 4 of them.

Where do you recall that from? Who said it?

Various places. Linode uses dual-CPU boxes.

The 5520 is a quad-core part, so that's eight real cores or sixteen hyperthreaded cores per system.

I'm imagine (wild guess) that the performance is better without hyperthreading, and that it's turned off. Especially since each node sees only 4 CPUs… Well, I sure wouldn't want to write the scheduler that would take that into account with hyperthreading, and my guess is that nobody else did (at least not well) either.

@Xan:

The 5520 is a quad-core part, so that's eight real cores or sixteen hyperthreaded cores per system.

I'm imagine (wild guess) that the performance is better without hyperthreading, and that it's turned off. Especially since each node sees only 4 CPUs…

I would guess that is currently the case too, but it would be nice to get some confirmation. As the OP mentions, the lower benchmarks are probably just down to the L5520's lower clock speed and the L5520 seems something of a backward step in performance without its Hyperthreading. Perhaps there are other factors that make the L5520 a good choice, like power consumption?

Does anyone know what effect enabling Hyperthreading and giving each linode access to 8 HT cores (across 4 real cores) would have on performance? I imagine users that have servers busy enough to take advantage of the extra available cores would see a boost in performance, but what effect would it have on those that 4 cores is enough for (the majority I guess); would HT worsen performance for them? If so, I guess it is best left disabled.

In general, servers perform worse with Hyperthreading. This is because most servers don't have enough processes simultaneously requesting CPU to warrant the decreased speed per virtual core.

In general, virtual machines perform worse the more CPUs you allocate to them. This is because in order to give any CPU to the VM, as many CPUs are allocated to it must be free. That is, if your VM has 4 virtual CPUs, then at least four physical CPUs must be idle for your VM to be context-switched in, even if your application is only requesting one virtual CPU.

@BarkerJr:

That is, if your VM has 4 virtual CPUs, then at least four physical CPUs must be idle for your VM to be context-switched in, even if your application is only requesting one virtual CPU.
There is no correlation of host CPUs to virtual CPUs in Xen. You can have a Xen instance with 100s of virtual CPUs, despite the hardware only having four physical CPUs. I don't have a full understanding of how Xen schedules CPU time, but I'm fairly sure what you just said is not accurate.

-Chris

You're probably right. I'm drawing on my knowledge of VMware, which is not Xen.

@caker:

You can have a Xen instance with 100s of virtual CPUs
Which Linode plan do I order for one of those? ;)

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct