![]() |
|
|
|||
|
info: using lzma compression
Hi there,
After seeing filename.tar.lzma tarballs for dnsmasq recently I looked into using lzma here, with slackware-11.0. Firstly, why bother? Well, on a 2.3MB datafile lzma compression is a lot better than bzip2: -rw-r--r-- 1 grant wheel 3995 2008-07-22 10:12 ip2c-names -rw-r--r-- 1 grant wheel 2304896 2008-07-22 10:13 ip2c-data -rw-r--r-- 1 grant wheel 612749 2008-07-22 15:23 ip2c-database.tar.bz2 -rw-r--r-- 1 grant wheel 293557 2008-07-22 15:35 ip2c-database.tar.lzma Lzma compression comes from the window world's 7zip archiver. 7zip publish an SDK under the LGPL. I downloaded the GPL'd unix source from: http://tukaani.org/lzma/ http://tukaani.org/lzma/lzma-4.32.6.tar.gz And had no problems compiling / installing the lzma utilities. Next was to add lzma to tar. There are patches in the source tarball but they don't match the tar versions included with slack-11.0 or slack-12.1. Another wrinkle is that slackware has two versions of tar installed, one for pkgtools and the other for userspace (slackware-12.1): grant@pooh:~$ ls -l /bin/tar* -rwxr-xr-x 1 root root 233196 2006-12-14 16:37 /bin/tar* -rwxr-xr-x 1 root root 115036 2006-12-14 16:37 /bin/tar-1.13* lrwxrwxrwx 1 root root 3 2008-05-26 13:45 /bin/tar-1.16.1 -> tar* Before patching tar myself, I checked for the latest version and found the latest tar-1.20 does support lzma, but not with a single letter option (-a) that the lzma utilities author used. I ran the usual ./configure; make; su; make install and let tar-1.20 install under /usr/local so it doesn't interfere with the slack tar, tar-1.20 is seen first on the $PATH. See: http://www.gnu.org/software/tar/ for the latest tar source. The new tar -a option compresses a file according to the target filename suffix: grant@deltree:~/ip2c$ time tar cvaf ip2c-database.tar.bz2 ip2c-data ip2c-names ip2c-data ip2c-names real 0m4.452s user 0m4.230s sys 0m0.130s grant@deltree:~/ip2c$ time tar cvaf ip2c-database.tar.lzma ip2c-data ip2c-names ip2c-data ip2c-names real 0m16.886s user 0m16.549s sys 0m0.253s So you can see lzma takes much longer to compress the same files, but decompression time is much faster (these times are on a 500MHz Celeron). grant@deltree:~/ip2c/xxx$ time bzcat ../ip2c-database.tar.bz2 |tar xv ip2c-data ip2c-names real 0m1.306s user 0m1.150s sys 0m0.153s grant@deltree:~/ip2c/xxx$ time lzcat ../ip2c-database.tar.lzma |tar xv ip2c-data ip2c-names real 0m0.484s user 0m0.347s sys 0m0.140s Unfortunately there's no single letter option for tar's lzma decompress like tar xvjf for bzip2, and: 'tar xvf ../ip2c-database.tar.lzma --lzma' looks clumsier to me than the lzcat ... above. The large datafile I'm compressing is very repetitive, with about 92k records like: 117440512 134217727 US 134217728 150994943 US 150994944 167772159 US 167772160 184549375 ZZ The lzma web page claims: "Average compression ratio of LZMA is about 30% better than that of gzip, and 15% better than that of bzip2." Grant. -- http://bugsplatter.mine.nu/ |
![]() |
|
| Thread Tools | Search this Thread |
| Display Modes | |
|
|