Anyone using pbzip2?
It is in essence parallel bzip and produces bzip2 compatible files, so I suppose it should be much faster than and produce smaller files than gzip.
13 Replies
#linode
It's how I back up my claims on IRC that I'm always able to get ~400% CPU when I want it
Although I'm sure bpzip2 works as intended because regular bzip would have a smaller archive but in much longer time.
I guess it would work best for larger archives (for sizes in gigabytes).
gzip –> pigz
bzip2 --> pbzip2
You're comparing gzip to pbzip2 for some reason. If you want to speed up gzip, use pigz. If you want to speed up bzip2, use pbzip2.
@Guspaz:
But you can't, because "gz speed" would be the speed achieved with pigz (the parallel version of gzip). And pbzip2 will obviously be nowhere near as fast as pigz.
And pigz would still produce archives as large as gz. So I wanted filesizes of bzip at the speed of gz or better.
The question remains do I want increased I/O when four processes start asking for disc access at once, and what is more important to me, smaller archive or increased I/O. No one can answer that for me, but myself.
I have 1.3G of files to archive and ship out compressed and encrypted via FTP every morning at 5am localtime when the server is least loaded.
gz takes approx 56 seconds and produces 800MB archive
bz2 takes few minutes and produces smaller file (don't remember exact figures)
pbzip2 takes 54 seconds and produces 740MB archive, but at 4 times the IO of gz, because I use 4 processes (one per core)
Now, if I used pigz, I am sure it would take much less than 50 seconds, but will produce 800MB archive just like gz, and peak the I/O four times more.
This is just a test, in preparation for rather larger archives (few GB) at rather larger load than I currently have on the server, which will be needed once we start a new local service in January.
So I will need to balance between:
smaller or bigger IO peak
longer or shorter network hogging to get the backup over, including smaller or larger archive to store on the backup server.
backup locally first, then ship away the encrypted tarball, or tar, compress, encrypt and ship away without local files on the fly (and I have yet to test if parallel compression works with data on stdout)
Sure, pigz will produce the archive faster, but I want them smaller, and doing so, I want to see how much IO and CPU will it take to produce them smaller, and what is better, longer but less taxing serial, or quicker but more IO intensive parallel compression. Though I still want smaller archive, so pbzip2 is better for me than pigz.
Compression performance is a bit better than bzip2, but usually much faster. RAM requirements are usually higher, though (depends on the dictionary size, which affects compression).
It's also got decent support. The app itself is available for *nix, but also for other platforms like Windows, and WinRAR can also decompression 7zip archives.
@Guspaz
I guess you're right. For more or less same filesizes, shorter processing time can only mean lower I/O, smaller peaks.