Dear Riz, You're right, entropy and information are different things. If you read Shannons papers, he defines information in the context of a noisy channel. The channel reads in message x and outputs y and can be summarized as a conditional probability distribution p(y|x).
The Shannon information is what x tells about y on average. If we do not know x, the uncertainty about y is measured by the entropy S(y) =-\sum_y p(y) \log p(y) with p(y)=sum_x p(y|x)p(x) Large entropy = high uncertainty = broad distribution If we know x, the uncertainty about y is less than before (because p(y|x) is more peaked than p(y)) and is given by the entropy S(y|x)=-\sum_y p(y|x) \log p(y|x) If x is random from p(x), then the expected uncertainty in y is S1(y)=\sum_x p(x) S(y|x) The difference I=S(y)-S1(y) is the average reduction in uncertainty of y and is called the information in x about y. Indeed, if x and y are independent: p(y|x)=p(y), I=0 Bert Kappen On Mon, 28 Oct 2002, Rizwan Choudrey wrote: > Dear all, > > I wondered if anyone could help with a paradox at the heart of my > understanding of entropy, information and pattern recognition. > > I understand an informative signals as one which contains patterns, as > opposed to radomly distributed numbers e.g. noise. Therefore, I > equate information with structure in the signals distribution. However, > Shannon equates information with entropy, which is maximimum when each > symbol in the signal is equally as likely as the next i.e. a distribution > with no `structure'. These views are contradictory. > > What am I misisng in my understanding? > > Many thanks in advance, > Riz > > > Rizwan Choudrey > Robotics Group > Department of Engineering Science > University of Oxford > 07956 455380 > > Bert Kappen SNN University of Nijmegen tel: +31 24 3614241 fax: +31 24 3541435 URL: www.snn.kun.nl/~bert
