Dear Riz,
You're right, entropy and information are different things. If you
read Shannons papers, he defines information in the context of a noisy 
channel. The channel reads in message x and outputs y and 
can be summarized as a conditional probability distribution p(y|x).

The Shannon information is what x tells about y on average. 
If we do not know x, the uncertainty about y is measured by the
entropy 

S(y) =-\sum_y p(y) \log p(y) with p(y)=sum_x p(y|x)p(x)

Large entropy = high uncertainty = broad distribution

If we know x, the uncertainty about y is less than before (because
p(y|x) is more peaked than p(y)) and is given by the entropy 

S(y|x)=-\sum_y p(y|x) \log p(y|x) 

If x is random from p(x), then the expected uncertainty in y is

S1(y)=\sum_x p(x) S(y|x)

The difference 

I=S(y)-S1(y)

is the average reduction in uncertainty of y and is called the
information in x about y. Indeed, if x and y are independent:
p(y|x)=p(y), I=0

Bert Kappen

On Mon, 28 Oct 2002, Rizwan Choudrey wrote:

> Dear all,
> 
> I wondered if anyone could help with a paradox at the heart of my
> understanding of entropy, information and pattern recognition.
> 
> I understand an informative signals as one which contains patterns, as
> opposed to radomly distributed numbers e.g. noise. Therefore, I
> equate information with structure in the signals distribution. However,
> Shannon equates information with entropy, which is maximimum when each
> symbol in the signal is equally as likely as the next i.e. a distribution
> with no `structure'. These views are contradictory.
> 
> What am I misisng in my understanding?
> 
> Many thanks in advance,
> Riz
> 
> 
> Rizwan Choudrey
> Robotics Group
> Department of Engineering Science
> University of Oxford
> 07956 455380
> 
> 

Bert Kappen             SNN           University of Nijmegen
tel: +31 24 3614241                      fax: +31 24 3541435
URL: www.snn.kun.nl/~bert


Reply via email to