Andrew wrote: > For instance, a 32 bit signal at 96kHz is more than neccesary to fully > represent an analog audio signal with no loss. So, if we sample 1s of > that audio, we use up 4 bytes * 1 channel * 96000 Hz = 384000 bytes/sec. > > Now, if through wavelet analysis, it's found that the signal can be > represented by the following sinusoids superposed: > > 3 sin (.5t - .2) > -2 sin (1.3t + .4) > 4 sin (-2.5t - .83) > > For that one second of audio, those sinusoids accurately represent the > sampled data. Now, sending the data that represents those sinusoids is as > easy as sending a 32 bit IEEE floating point number for each of the 3 > parameters per sinusoid, so --- for simple sinusoids, that relatively > simple > signal can be represented by only 9 paramters * 4 bytes = 36 bytes for an > accurate representation of 384,000 bytes worth of sample. Nyquist's law > effectively restricts sampled audio to 1/2 the sample rate, giving a > certain minimum amount of information neccesary for transfer to transmit > that signal from one point to the other. This sort of compression breaks > the bounds of Nyquist's law in transferring, though it still limits the > actual sampling of the audio. > > Am I misinterpreting the technique of wavelet compression, such that the > model and calculations which I've provided here are inaccurate, baseless, > or just plain BS? If so, how does wavelet compression actually achieve > what it does? > I should preface my remarks with the disclaimer that I am just a humble (no, sorry, arrogant) programmer, and no mathematician, so this is seat-of-the-pants. 1) Nyquist provides the criterium for sampling to PERFECTLY reconstruct the sampled signal within a specified bandwidth. It doesn't say anything about what you can do with the samples once you've got them. 2) Your description of analysing a signal into various sinewave components is entirely plausible, but looks to me more like FFT analysis, not wavelet. 3) The wavelet transform acts like a pair of high and low pass filters. You take a set of N samples, and apply the transform, which involves a lot of multiplies and adds of adjacent sample values, and you end up with N/2 low pass samples and N/2 high pass samples. The transform is then applied repeatedly to the low pass sample output of the last stage, giving smaller and smaller sample sets until you reach some point that depends on the lowest frequency you are interested in (I guess). The first high-pass block contains data about the highest frequencies in the sample data, the next one somewhat lower frequencies, and so on down to the final low pass block, which describes the lowest frequencies. I have no idea how or why this actually works, I just bang the rocks together, and out come the numbers. At this point we have a different representation of the sample data, partitioned into frequency bands, but no compression has taken place. You can recombine the blocks with the reverse transform and reconstruct the original samples almost perfectly (limited by the accuracy of your floating-point arithmetic hardware). The compression relies on the fact that our logarithmic response senses are not too fussy about absolute levels, particularly at higher frequencies. (Note: My experience is with video compression, but I suspect audio will be similar). So, if you take the wavelet sample data for some frequency band, and divide each number by (say) 128, each original 16 bit sample now occupies only 11 bits. Moreover, you tend to find a lot of zeroes, or small numbers, which can be represented by a small number of bits using a huffman coding scheme. The process of dividing each sample value by some number is called quantisation, and is generally applied more heavily to the higher frequencies, where the senses have a lower resolution. The compressed data is huffman decoded, and then the sample values for each frequency block are multiplied by the divisors used at compression to give something like the original wavelet transform output, but with a loss of resolution of the amplitudes of the frequency components. The reverse wavelet transform is then applied to reconstitute a signal which is hopefully not too dissimilar (perceptually) to the original. > ... do you have any good references > for the algorithms and mathematics behind it all? > I quickly get lost in the mathematics, and have gleaned most of my knowledge from the data sheets for the Analog Devices ADV601 Video Codec. http://www.analog.com/publications/magazines/Dialogue/30-2/wavelet.html This reference (Figure 5) shows how the high (spatial) frequencies tend to be mostly zero. I apologise if this is a bit dense. Perhaps we should take the subject off the list ? simon ----------------------------------------------------------------- To stop getting this list send a message containing just the word "unsubscribe" to [EMAIL PROTECTED]