Andrew wrote:

> For instance, a 32 bit signal at 96kHz is more than neccesary to fully 
> represent an analog audio signal with no loss.  So, if we sample 1s of
> that audio, we use up 4 bytes * 1 channel * 96000 Hz = 384000 bytes/sec.
> 
> Now, if through wavelet analysis, it's found that the signal can be 
> represented by the following sinusoids superposed:
> 
> 3 sin (.5t - .2)
> -2 sin (1.3t + .4)
> 4 sin (-2.5t - .83)
> 
> For that one second of audio, those sinusoids accurately represent the 
> sampled data.  Now, sending the data that represents those sinusoids is as
> easy as sending a 32 bit IEEE floating point number for each of the 3 
> parameters per sinusoid, so --- for simple sinusoids, that relatively
> simple
> signal can be represented by only 9 paramters * 4 bytes = 36 bytes for an
> accurate representation of 384,000 bytes worth of sample.  Nyquist's law
> effectively restricts sampled audio to 1/2 the sample rate, giving a 
> certain minimum amount of information neccesary for transfer to transmit
> that signal from one point to the other.  This sort of compression breaks
> the bounds of Nyquist's law in transferring, though it still limits the
> actual sampling of the audio.
> 
> Am I misinterpreting the technique of wavelet compression, such that the
> model and calculations which I've provided here are inaccurate, baseless,
> or just plain BS?  If so, how does wavelet compression actually achieve 
> what it does?
> 
I should preface my remarks with the disclaimer that I am just a humble (no,
sorry, arrogant) programmer, and no mathematician, so this is
seat-of-the-pants.

1) Nyquist provides the criterium for sampling to PERFECTLY reconstruct the
sampled signal within a specified bandwidth. It doesn't say anything about
what you can do with the samples once you've got them.

2) Your description of analysing a signal into various sinewave components
is entirely plausible, but  looks to me more like FFT analysis, not wavelet.

3) The wavelet transform acts like a pair of high and low pass filters. You
take a set of N samples, and apply the transform, which involves a lot of
multiplies and adds of adjacent sample values, and you end up with N/2 low
pass samples and N/2 high pass samples. The transform is then applied
repeatedly to the low pass sample output of the last stage, giving smaller
and smaller sample sets until you reach some point that depends on the
lowest frequency you are interested in (I guess).
The first high-pass block contains data about the highest frequencies in the
sample data, the next one somewhat lower frequencies, and so on down to the
final low pass block, which describes the lowest frequencies. I have no idea
how or why this actually works, I just bang the rocks together, and out come
the numbers.

At this point we have a different representation of the sample data,
partitioned into frequency bands, but no compression has taken place. You
can recombine the blocks with the reverse transform and reconstruct the
original samples almost perfectly (limited by the accuracy of your
floating-point arithmetic hardware).

The compression relies on the fact that our logarithmic response senses are
not too fussy about absolute levels, particularly at higher frequencies.
(Note: My experience is with video compression, but I suspect audio will be
similar). So, if you take the wavelet sample data for some frequency band,
and divide each number by (say) 128, each original 16 bit sample now
occupies only 11 bits. Moreover, you tend to find a lot of zeroes, or small
numbers, which can be represented by a small number of bits using a huffman
coding scheme. The process of dividing each sample value by some number is
called quantisation, and is generally applied more heavily to the higher
frequencies, where the senses have a lower resolution.

The compressed data is huffman decoded, and then the sample values for each
frequency block are multiplied by the divisors used at compression to give
something like the original wavelet transform output, but with a loss of
resolution of the amplitudes of the frequency components. The reverse
wavelet transform is then applied to reconstitute a signal which is
hopefully not too dissimilar (perceptually) to the original.

>  ... do you have any good references
> for the algorithms and mathematics behind it all?
> 
I quickly get lost in the mathematics, and have gleaned most of my knowledge
from the data sheets for the Analog Devices ADV601 Video Codec.
http://www.analog.com/publications/magazines/Dialogue/30-2/wavelet.html
This reference (Figure 5) shows how the high (spatial) frequencies tend to
be mostly zero.

I apologise if this is a bit dense. Perhaps we should take the subject off
the list ?

simon
-----------------------------------------------------------------
To stop getting this list send a message containing just the word
"unsubscribe" to [EMAIL PROTECTED]

Reply via email to