function y = sorted_kraft(the_histogram) % Kraft_vector = sorted_kraft(frequencies) % generates a sorted standard[*] % Kraft vector from a list of letter frequencies. % The output Kraft vector tells the Huffman coder % how many bits to use to encode each unique symbol % in the plaintext. % Symbols that don't occur (any zeros in the frequency list) % are ignored, and not assigned a length. % % Usage: % plaintext = double(imread('cameraman.tif')); % plaintext = [5 5 5 5 4 4 3 3 2 1]; % frequencies = frequency(plaintext(:)); % Kraft_vector = sorted_kraft( frequencies )' % codebook = compactcodeP(Kraft_vector); % print_bitstrings( codebook ) % returns % Kraft_vector = [ 2 2 2 3 3 ], % meaning that when one compresses that datavector, % the most-frequently-used symbol use a 2 bit code, % the least-frequently-used symbol will use a 3 bit code, % etc. % and then print_bitstrings says % 01 % 10 % 11 % 000 % 001 % [*] Given a particular piece of plaintext, % often there are several other % Kraft vectors that create Huffman codes. % For the above example, Kraft_vector = [ 1, 2, 3, 4, 4 ] % and Kraft_vector = [ 1 3 3 3 3 ] % generates very different-looking Huffman codebooks. % (All Huffman codebooks compress that plaintext % into exactly the same number of bits). % % Huffman, David. "A method for the construction of minimum redundancy codes." % Proc. IRE, vol. 40, pp. 1098-1101, 1952. % % See also KRAFT, HUFFMAN_COMPRESS, HUFFMANLENGTH, FREQUENCY, COMPACTCODEP, PREFIX_ENCODE. % Change log: % 1999-07-12:DAV: It already *was* generating standard Huffman codes. Duh. % 1999-06-25:DAV: modified to generate "standard" Huffman codes. % 1999-06-24:DAV: David Cary wrote more documentation % ???:JCK: originally "huffmancode.m" by John C. Kieffer % by John C. Kieffer % http://www.ee.umn.edu/users/kieffer/programs.html % documented in % ftp://oz.ee.umn.edu/users/kieffer/seminar/notes3.pdf % ftp://oz.ee.umn.edu/users/kieffer/seminar/notes3.ps % ftp://oz.ee.umn.edu/users/kieffer/slides/slide31.ps % A algorithm that calculates the Kraft vectors in-place % (which should be much faster) was discovered by % Alistair Moffat, alistair@cs.mu.oz.au, % Jyrki Katajainen, jyrki@diku.dk % November 1996. % http://www.cs.mu.oz.au/~alistair/inplace.c % sort the histogram from least-frequent to most-frequent, % and trim "zero" frequencies (items that never occur). F = sort(the_histogram( find(the_histogram) )); if( isequal([],F) ), % avoid crashing when asked to compress nothing. % This ends up returning y = 0. F = [0]; end; % unique_symbols == how many distinct symbols occur at least once in the plaintext. unique_symbols=length(F); % maximum possible size of M M=zeros(unique_symbols,unique_symbols+1); % Initial "used" width of M M=zeros(unique_symbols,2); M(:,1)=F(:); % Each row of M represents a node in the tree: % the frequency of that node (M(j,1)) % followed by the Kraft vector % of all the leaves growing from that node (M(j,2:m)). % followed by (a) trailing zero(es). % The goal is to merge nodes % ("prune the list") until only 1 row remains, % the root of the tree. % The Kraft vector of that root node % is output. % Initially, we start with a list of leaves, % which have a Kraft vector of [0]. % In the 1st iteration, % we merge 2 leaves together, % creating a node where the decision to % go to the left or right node is 1 bit, % a Kraft vector of [1 1]. % Each time we create a new node % and "prune" the 2 nodes that grow from it, % the decision about whether to take the % left branch or the right branch % adds 1 bit to every leaf that grows from the 2 nodes % (that now grow from this new node). R=unique_symbols; while R > 1 % Pick 2 least-frequency nodes. % If there is any ambiguity % (several nodes with the same frequency), % how should we break the tie ? % If several nodes have the same frequency, % then we prefer the % least-recently-created node. % This is the same as % preferring the node with least depth. % The theorem claims this % minimizes the % depth of the Huffman tree % (minimizes the length of the longest codeword). % (The "depth" of a node is % the maximum number of bits % needed to get to the furthest leaf, i.e., % the maximum % value of that node's Kraft vector, % which should be in the 2nd column). % We don't *really* need to sort them. % In Matlab, the simplest way to do this % is to put the most-recently-created node % at the bottom of M % and sort by frequency (in the 1st column). % Rows of M that have the *same* % frequency will still be in the same order % after the sort (hopefully). % Theorem: % If there is some ambiguity in which 2 nodes % are the "least-frequency" nodes, % then preferring the least-recently-created node(s) % will generate a minimum-depth Huffman tree. % % Proof: % ??? % The deepest levels of the Huffman tree % are always created first. if(1), % The matrix is sorted by least-frequently-used first. % which should be identical to column 2: depth. % (Note that we never explicitly sort() anything). v=M(:,1); [dummy, t1] = min(v); v(t1) = +inf; [dummy, t2] = min(v); j1=min(t1,t2); j2=max(t1,t2); else, % profile says this is slightly slower. % The matrix is sorted primarily by column 1: lowest-frequency-first. % The matrix is sorted secondarily by least-frequently-used first, % which should be identical to column 2: depth. M = sortrows(M,1); j1 = 1; j2 = 2; end; % ASSERT( picking least-recently-created is the same as picking least-depth ) if(1) % find *all* the (ambiguous) lowest-frequency nodes if( M(j1,1) < M(j2,1) ) I = find( M(j2,1) == M(:,1) ); else I = find( M(j1,1) == M(:,1) ); % This includes M(j2,1) when it is not the unique lowest-frequency node. end; % check to make sure they are already sorted % least-depth-first: for k = 2:length(I), % ASSERT( M(k-1,2 ) <= M(k,2) ); if( M(I(k-1),2 ) <= M(I(k),2) ), else, M(I(k-1),2) M(I(k),2) M(I,:) k R error(' Sorry. I thought that was impossible. ') end; end; end; % Join node j1 to j2 new_freq = M(j1,1) + M(j2,1); % Copy the Kraft vectors from the least-common rows. % Check maximum depth (in column 2) % and make sure it is the first in the new Kraft vector. if( M(j1,2) < M(j2,2) ), new_k = M([j2 j1], 2:end); else, new_k = M([j1 j2], 2:end); end; % Eliminate extraneous zero padding. % Kraft vectors never have zeros, % except for leaves, which have a Kraft vector of [0]. new_k = new_k(find(new_k))'; % squeeze out all zeros if( 0 == M(j1,2) ) new_k = [new_k 0]; % leaf - put zero back in end; if( 0 == M(j2,2) ) new_k = [new_k 0]; % leaf - put zero back in end; % Create a new, merged node. % The frequency of this new node (new_node(1)) is the sum of % all its leaves. % This merge adds 1 bit in depth to all the % leaves that grow from it. % The depth of this new node (bottom_node(2)) is the maximum depth of % any of its leaves. % bottom_node=[r1(1)+r2(1) v1+1 v2+1 zeros(1,unique_symbols+2-m1-m2)]; bottom_node=[new_freq new_k+1 0]; % ASSERT( new_node(2) == max( new_node(2:unique_symbols+1) ) ); % prune the 2 old branches, % replacing them with the new node, % which goes at the bottom of M. % This ensures % that we preferentially pick the least-depth node. % if( size(M,2) < length(bottom_node) ) % new_node is too wide ! So expand width of M M(1,length(bottom_node)) = 0; end; M(j1,1:length(bottom_node)) = bottom_node; s1=1:j1-1; s2=j1+1:j2-1; s3=j2+1:R; remaining_rows = [s1 s2 s3 j1]; M = M(remaining_rows,:); % reduce M by 1 row, and put new node at bottom. R=R-1; end %z=M(1,2:unique_symbols+1); % ASSERT(size(M) == [1, unique_symbols+2]); % ASSERT( 0 == M(1,end) ) z=M(1,2:end-1); % ASSERT( 0 < min(z) ); y=sort(z); % Sean Danaher University of Northumbria at Newcastle UK 98/6/4 % ftp://ftp.mathworks.com/pub/contrib/v5/misc/huffman.m % http://www.mathworks.com/ftp/miscv5.shtml % wrote a perhaps easier-to-understand version, % that does basically what sorted_kraft and compactcodeP % do, all in one file. It uses nested cell structures. % Some time ago, Dr. Andreas Geißler % posted some Huffman compression/decompression code. % % http://www.dejanews.com/[ST_rn=ps]/getdoc.xp?AN=244193771 % http://www.dejanews.com/[ST_rn=ps]/getdoc.xp?AN=242541946 % %Subject: Re: Huffman compression of a matrix % Date: 28 May 1997 00:00:00 GMT % From: M-Bär % Organization: Glücks-Bärchi % Newsgroups: comp.soft-sys.matlab % %Hello, % my Name is % Dr. Andreas Geißler % Inst. für Hochfrequenztechnik % Technische Hochschule Darmstadt % 64283 Darmstadt % Germany %... %Subject: Re: Huffman compression of a matrix % Date: 27 May 1997 00:00:00 GMT % From: M-Bär % Organization: Glücks-Bärchi % Newsgroups: comp.soft-sys.matlab %... %MALEBAUM.M paints the Huffman-Codetree in a matlab-figure. %The procedure malebaum.m works well for Histogramms up to 32 elements. %... % [LH,TH]=MALEBAUM(H) % Darstellung des Huffman-Baumes % %Subject: Here it is: HUFFMAN-Coding für Matlab % Date: 20 May 1997 00:00:00 GMT % From: Klaas Klever % Organization: Murks gmbH % Newsgroups: comp.soft-sys.matlab % end sorted_kraft.m