Practical CFLAGS considerations: -Os, -O2, and -O3
Some people assume that -O3 will yield the fastest performance compared to -O2 or -Os. That assumption can end up wasting your time (and your cpu cycles). The only thing -O3 guarantees is larger binaries which increase the chances of both page faults and swapping which can outweigh any performance gains.
NOTE: Unlike swapping, page faults don't get reduced by installing more physical RAM. Only smaller binaries (or rearranging function locations within binaries) can reduce page faults. If I'm not mistaken, a page is only 4KB (thats the page size in Windows XP).
Try benchmarking your most frequently used programs using the concurrency levels you encounter during normal use. You might be surprised to find that -Os probably gives you the better performance than -O3 and sometimes even better than -O2 when you're on a linode.
-Os = most optimizations present in -O2, plus size optimizations
slightly slower code, but smaller size benefits speed too
3 Replies
the rest of your analysis about binary size and code relocation effecting page faults is basically correct.
@inkblot:
actually, installing more ram does reduce the overall number of page faults. the reason is that with more ram available, the kernel's vm is able to keep more pages resident, rather than having to free them and then fault later if and when they are needed.
Keep in mind that page faults will happen regardless of physical RAM availability when new processes start up. So adding physical RAM on a system with sufficient RAM won't reduce these instances of page faults.
@inkblot:
i would bet that because you are using gentoo, you are accustomed to high page fault levels that don't noticably change by adding ram.
Debian 3.1 is the only Linux distro I'm currently using. It looks like cdbs will make controlling CFLAGS across multiple Debian packages easier.