processor affinity

Hello,

Just wondering whether restricting certain processes to specific CPU cores would improve overall performance. I am aware that most apps use only the 1st core… I issue a command like this:

/bin/taskset -c 3,4 /path/to/executable

to restrict my process to 3rd and 4th core. The process could be a certain daemon/server, etc.

Will that be of any benefit? The reason I'm not sure is because I don't know how Xen virtualizes processor cores.

13 Replies

I don't know but if so, I will change my Apache to use a diffferent one and my MySQL to use yet another.

The possibilities are endless!!

I did notice my apache choking on a gallery import at 100% cpu…..this would be cool..

maybe to even run Apache off one and childs off other.

Someone smarter the me please reply! :D

@vindimy:

I am aware that most apps use only the 1st core…
No they don't. Most apps don't care (or know) what core they run on, and let the OS decide. In a dual core system if there are two processes running then the OS will schedule one to the first core and the other to the second core. And the core a process runs on may change between time slices during the lifetime of a process.

Processor affinity is typically used to help with L1 and L2 cache retention by trying to keep a process scheduled on a single core; in that way there's more chance that the process pages and data are already in the cache, avoiding the need to go to (slow) memory.

IMHO it doesn't really make much sense in a shared virtual machine environment, because other linodes can easily cause your processes pages to be flushed from the cache.

You may be right, for system processes, but what about the rest?

From http://www.codinghorror.com/blog/archives/000942.html:
> …only rendering and encoding tasks exploit parallelism enough to overcome the 25% speed deficit between the dual and quad core CPUs.

Although the article talks about comparison of 2 vs 4-core CPU's, you can clearly see from the graphs that core usage is far from uniform across a processor. From all CPU usage graphs, 1st core is almost always 100% busy… 4th one is almost idle.

Another example to support my argument - on my dual-core machine, 1st core's temperature is always 4-8 degrees higher than 2nd core's. I think I applied the thermal compound correctly :roll:

So, according to you, if most proccesses don't know/care about which core to run on, does the OS simply prefers 1st core? Or it simply can't predict the load / split the tasks into parallel chunks efficiently?

@vindimy:

You may be right, for system processes, but what about the rest?

From http://www.codinghorror.com/blog/archives/000942.html:
> …only rendering and encoding tasks exploit parallelism enough to overcome the 25% speed deficit between the dual and quad core CPUs.

I haven't read the article, but picking the word "parallelism" indicates a potential area of confusion.

Let's say you want to add up the numbers 1 to 100.

The simple method would be 1+2+3+4+…+99+100. Now if you run this it will all run on a single CPU. The other 3 CPUs will be totally idle and doing nothing.

Instead, lets do it as 4 sums:

1+..+25

26+…+50

51+…+75

76+…+100

And then add up the 4 results. This will now use all 4 CPUs and so run (almost) 4 times quicker because the 4 smaller sums can be run in parallel; it's making use of parallelism.

The problem is that very very few programs are designed to make use of parallelism properly. Rendering is (nowadays) one of them; lots of work has been done on those algorithms. I believe Excel 2007 can do some small amount of parallelism. But most programs? Not really. For a lot of programs 1 really fast CPU is better than 4 slower CPUs simply because those programs can only use 1 CPU at a time.

This is actually a big issue in computing at the moment; CPUs are reaching a plateau on per-core performance. Instead of getting mega-Ghz speeds what we're seeing instead is more cores. So dual core, quad core, eight core CPUs… computer scientists and programmers need to come up with a better way of parallelising their code because the free ride of per-core speed increases is almost over. You'll hear lots more about "multi-threaded" applications in the future.

Interestingly, Unix itself is pretty good at multi-core work because it's designed to run with lots of independent processes. For example, apache may fork off 100 seperate httpd processes; each of those can run on a different CPU because there's little interaction between them. But that's fine for small tasks (like serving web pages, email handling, etc); it still doesn't solve the large computationally intensive tasks.

Another take on this is the IBM "Cell" processor and modern graphics card GPU chips; these have lots of independent processing units. GPU and cell programming is "hot stuff".

(OK, I just took a quick glance at that article… yeah, seems to be talking about the same thing that I just wrote about).

As you can hopefully see, parallelism isn't really related to processor affinity. Processor affinity, as I earlier described, is related to cache coherency and in keeping a thread of execution (a process, in this case) accessing data more efficiently by stopping it swapping to other cores. It does nothing to solve the parallelism problem of how to use the other cores.

As for your heat measurements; it's very possible the OS is preferring the first core when it comes to scheduling. shrug Dunno; not looked into the kernel scheduler that much! Different OSes may have different performance characteristics.

Wow,

Thanks for such thorough answer. So, I guess it still makes sense to lock a process to a single core/CPU if it doesn't consume too much computing resources (such as in my case, I'm running ioquake3 server, not making any extra threads there). At the very least to get cache read/write improvements.

Maybe somebody knowledgeable about Xen could share with us how CPU scheduling works on a machine with 30+ VM's… I'm guessing they don't just all get dumped on 1st core, there's some distribution. If so, then processor affinity doesn't matter much from inside VM…

I'd be more comfortable letting the kernel handle all that. If it's been running a process on a core, it'll tend to keep the process on that core, for exactly those cache reasons. That's one of the basic tasks an OS handles. I'd say let it do its job.

@Xan:

I'd be more comfortable letting the kernel handle all that. If it's been running a process on a core, it'll tend to keep the process on that core, for exactly those cache reasons. That's one of the basic tasks an OS handles. I'd say let it do its job.
Exactly. Now there may be cases where the kernel gets it wrong or specialised use cases, but in general… let the OS schedule stuff as it sees fit.

In a Xen environment you don't even know that your virtual CPU (which is just a thread of execution to the master layer) will even stay on the same physical CPU (especially with ring 0 traps) so there is (in my opinion) no point in using processor affinity inside your linode.

@sweh:

Another take on this is the IBM "Cell" processor and modern graphics card GPU chips; these have lots of independent processing units. GPU and cell programming is "hot stuff".
So I wrote that yesterday. And today I read this, which is a desktop supercomputer made out of GPU chips. Neat, huh?

This is a disturbing trend that I have been seeing; noone knows, and then someone responds in a way that seems to imply that the question should not be asked.

"The Kernel can handle it…let it alone…"

"You won't see any performance gain…so…"

"Only people who…need…"

All these answers I have seen; Everyone needs to remember ALL questions deserve to be asked, even if a search engine should first be consulted, even if there will be no real performance gain some people just want to be able to do something or learn something.

Here is what I have been able to find so far:

There is a program installed in Ubuntu (default with the minimal install) called "taskset"

You SHOULD be able to run a command like

taskset -c -p Code:

:~$ taskset -c 2 -p 6721

sched_setaffinity: Invalid argument

or

Code:

taskset -c 0 -p 6721

execvp: No such file or directory

failed to execute -p

(Which is fine since I don't think c 0 is valid)

I would really like the ability to automatically set a process to a core, even though the kernel is pretty good at it, some apps break the mold.

I will attempt to update this as I research this more.

@routermods:

This is a disturbing trend that I have been seeing; noone knows, and then someone responds in a way that seems to imply that the question should not be asked.

"The Kernel can handle it…let it alone…"

"You won't see any performance gain…so…"

"Only people who…need…"

All these answers I have seen; Everyone needs to remember ALL questions deserve to be asked, even if a search engine should first be consulted, even if there will be no real performance gain some people just want to be able to do something or learn something.
Yeah, I know what you mean. One of my biggest complaints with the Redhat crowd is "You don't need to compile the kernel, leave it alone". That along with other things is why dislike Redhat or any distro's based on it.

You should not need to set CPU affinity unless you have a specific reason to overrule kernel migration rules.

In linux (since sometime in the 2.5 series), the scheduler has tried to lean towards keeping processes on their current processor when possible. This has a nice effect on performance - in particular, processes bouncing between CPUs results in performance loss, as the process's dirty/locked cache lines have to be flushed and released so another processer can pick them up (and that processer will start from a cold cache), not to mention IPI costs and etc.

The particular implementation of this can result in processes seemingly clustering on certain cpus if you have low load. In particular, Linux will only start migrating processes if one cpu is heavily loaded, and much more loaded than the other cpus. At this point the migration threads are awoken to actively move some processes to another CPU. However, since this does not happen unless the system is under heavy cpu load, newly created processes on a lightly loaded system will tend to remain on the CPU that created them.

Other considerations include the fact that the aforementioned cache costs apply to kernel data structures too, and so forcing affinity needlessly can decrease performance due to cache lines in the kernel's disk cache, etc, can bounce between cpus. However this is likely a rather small effect.

There is one possible benefit to forcing affinity - even if you have a lightly loaded cpu, if one process starts their timeslice when previously under an interactive workload (and so has a long timeslice granted to it), and decides to use the whole thing, and then partway through an event occurs to awaken another process, the latter process may need to wait for the former's timeslice to expire before being awoken. With them on different CPUs, wakeup latency may be lower. However, on a virtualized environment, this kind of overhead is likely to be lost in the noise of other linodes anyway.

Another legitimate use of forcing affinity is guaranteeing cpu time to a process. Simply give it affinity to one CPU, then deny use of that cpu to everything else. But, again, on a virtualized environment, 'guarentees' don't mean squat without hypervisor support, and usually setting niceness is enough.

In conclusion, setting affinity is something you should only do for a good reason - it tends not to help, and if the system /does/ go under heavy load at some point, then it's only going to restrict the scheduler's choice of where to allocate processes. Linux avoids needless migration for important performance reasons, and generally it's not wise to overrule it without specific evidence that it's not doing its job properly.

@routermods:

This is a disturbing trend that I have been seeing; noone knows, and then someone responds in a way that seems to imply that the question should not be asked.
> All these answers I have seen; Everyone needs to remember ALL questions deserve to be asked, even if a search engine should first be consulted, even if there will be no real performance gain some people just want to be able to do something or learn something.

This, of course, is not what happened. The original question was if there is a performance improvement to be obtained by using taskset. THAT question was answered for the general case, with an explanation of what the taskset command did. There then followed a discussion on what problem the OP thought he was trying to solve, and why the taskset command wasn't relevant to that discussion.

Nowhere has anyone said "don't do that". What you've seen are recommendations; "in general let the OS schedule stuff" and "I'd say let it do its job" and similar.

If you want to play around and test stuff out, then that's fine. But it wasn't the question we were answering.

@marcus0263:

One of my biggest complaints with the Redhat crowd is "You don't need to compile the kernel, leave it alone". That along with other things is why dislike Redhat or any distro's based on it.

Which "Redhat crowd"? If you're dealing with with RedHat Enterprise Linux then compiling your own kernel negates support so you might as well use CentOS, which has documentation and processes on how to compile your own kernel. Similarly the Fedora project documents how to compile your own kernel.

Sometimes I think I must be living in a different world to others…

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct