This test is designed to model 'real-world' performance of lossless data compressors. The
test set contains a mix of different file types which are chosen with 'What do people use
archivers for the most' in mind. The testset should contain data, weighted (in both type and
proportion of files in the set) by how often these files are used for compression by normal
users using compression software. So for example there will be more txt files then .ocx files
in the set (yes, this is arbitrary). The set contains 100's of files and has a total size of over
300 Mb. The idea of a large collection is filtering out the 'noise'. A compressor might perform bad
on 1 or 2 filetypes, but on a very large collection it will not hurt as much.
Some programs like CCM and BZIP2 can only compress one file at a time. For these programs a
single TAR-file is created containing all files. The files in this TAR-file are ordered
alphabetically on suffix, then name. Results of these compressors are marked with an 'Y' in the
tarred column.
The testset consists of the following file types :
Filetype(s) | Description | % of total | # of files |
---|---|---|---|
TOC, MBX | Eudora mailboxes | 12.31 | 16 |
EXE, DLL, OCX, DRV | Executables | 10.99 | 35 |
TXT, RTF, DIC, LNG | Text files in several languages | 10.21 | 41 |
BMP, TIFF | Bitmaps/TIF images | 7.88 | 15 |
LOG | Log files | 6.34 | 6 |
HTM, PHP | HTML files | 6.13 | 19 |
DOC | MS Word files | 6.08 | 30 |
C, CPP, PAS, DCU | Source Code | 6.00 | 235 |
MDB, CSV | Databases | 4.26 | 7 |
HLP | Windows Help files | 4.23 | 7 |
CBF, CBG | Precompressed chess-databases | 3.55 | 2 |
WAV | Wave soundfiles | 3.45 | 9 |
XLS | XLS Spreadsheets | 2.41 | 16 |
Adobe Acrobat document | 1.59 | 6 | |
TTF | True Type Fonts | 1.15 | 15 |
DEF | Virus definition files | 1.10 | 3 |
JPG, GIF | Image files | 0.53 | 9 |
CHM | Precompressed help files | 0.49 | 2 |
INI, INF | INI files | 0.42 | 10 |
Others | DAT,JAR,M3D,SYS,PPT,MAP,WP,RLL,RIB.. | 10.88 | 27 |
Considering the fact it's supposed to be a 'real-world' test I will not look at the best possible
(command-line or gui) switch combination to use for optimal compression, but only test a limited set as
'normal users' would do. For 7-zip this means for example I will use the GUI and select the
Ultra compression method (which can be easily beaten with some good command line switches), WinRar will
be tested with max dictionary size and solid archiving etc. Programs are allowed to use a maximum of
800 MB memory and must finish the compression stage within 12h. Compressed size must be 50% or less
compared to the original size to be listed on MFC.
For my single file tests I got lots of requests to add the compression time to the tables. I
did not do this for the reasons stated in the single file summary file, but I'm planning to
measure compression times for this multiple file test!. I also decided to make this testset
'non public', so it's harder for developers to tune their program towards this specific test.
I think this is the most fair way to get 'real life' performance tests.
Scoring system: The program yielding the lowest compressed size is considered the best program. The
most efficient (read:use full) program is calculated by multiplying the compression + decompression
time (in seconds) it took to produce the archive with the power of the archive size divided by the
lowest measured archive size. The lower score the better. The basic idea is a compressor X has the
same efficiency as compressor Y if X can compress twice as fast as Y and resulting archive size of X
is 10% larger than size of Y. (Special thanks to Uwe Herklotz to get this formula right)
score_X = POWER(2; ((size_X / size_TOP) - 1) / 0,1) * time_X with score_X efficiency score for a certain compressor X time_X time elapsed by compressor X (comp + decomp time) size_X archive size achieved with compressor X size_TOP archive size by top archiver (smallest benchmark result)
"0,1" represents 10% and power of 2 ensures that for each 10% worse results (compared with top) the time is doubled, so any archiver (except top compressor) will get a penalty on time. The score of top compressor is always equal to its time value.
Fatal error: Uncaught Error: Call to undefined function mysql_connect() in /var/www/vhosts/maximumcompression.com/httpdocs/data/connect.php:3 Stack trace: #0 /var/www/vhosts/maximumcompression.com/httpdocs/data/summary_mf4.php(16): include() #1 {main} thrown in /var/www/vhosts/maximumcompression.com/httpdocs/data/connect.php on line 3