function [RL, the_entropy] = huffmanlength(x); %HUFFMANLENGTH find length (in bits) of huffman_compress(x(:)). %Let x be a data vector with integer entries %Then y = huffmanlength(x) is the length of the % compressed data (in bits) % were x(:) ever to be encoded using any Huffman code for x. % (Note that "compressed data" does not % count the fixed-size header, % including Kraft vector and size(x), % that is essential to be able to decode that data). % % Usage: % plaintext = [ 11 11 11 22 22 33 33 4 5 11 ] % L = huffmanlength( plaintext ) % returns L = 22, % meaning that when one compresses that datavector, % the compressed data will be 22 bits long. % % [L, the_entropy] = huffmanlength( plaintext ) % also calculates the entropy of the given data (in bits), % in this case the_entropy = 2.1219. % It is always true that % the_entropy <= L/length(plaintext(:)) < 1+the_entropy. % % See also ENTROPY, HUFFMAN_COMPRESS. % by John C. Kieffer % http://www.ee.umn.edu/users/kieffer/programs.html % documented in % ftp://oz.ee.umn.edu/users/kieffer/seminar/notes3.pdf % more documentation by David Cary 1999-06-24 F=frequency(x(:)); % get non-zero frequencies if( 12, % merge 2 least-frequent nodes: % requires 1 bit for each time either node appears in file % to discriminate between the 2 branches % that lead from the merged node. S=S+q(1)+q(2); q=[q(1)+q(2); q(3:N)]; q=sort(q); N=N-1; % until only 2 nodes left. end % now q vector has only 2 items. % ASSERT( q(1)+q(2) == sum(F) ) % We require 1 bit for each item in the file % to discriminate between the 2 main branches % from the root node of the Huffman tree. RL=S+sum(F); else, % Special case: there is only 1 unique symbol, % repeated any number of times. % Since "how many times" is remembered elsewhere % in the file header, % and "which symbol" % can be figured out by % the one symbol in the Kraft vector that is % nonzero, % the huffman compressed_data in this case is []. RL = 0; end; % ASSERT( the_entropy <= compression_rate ) % ASSERT( compression_rate < 1 + the_entropy ) the_entropy = entropy(x(:)); compression_rate = RL./length(x(:)); if( and(... (the_entropy <= compression_rate),... (compression_rate < 1 + the_entropy)... ) ), else RL the_entropy compression_rate disp('Sorry. I thought this was mathematically impossible. ') error('Please email d.cary@ieee.org with some clues.'); end; % end huffmanlength.m