Re: [tips] What's normal about the normal curve?

Mike Palij Mon, 20 Jun 2011 15:54:09 -0700

On Mon, 20 Jun 2011 09:25:58 -0700, Jim Clark wrote:
>> Mike Palij wrote:
>>(4)  Somebody should mention that what John Kulig and Jim Clark
>>allude to below relates to the "central limit theorem".  The Wikipedia
>>entry on the normal distribution also coverts this topic; see:
>> http://en.wikipedia.org/wiki/Normal_distribution#Central_limit_theorem  
>> and there is a separate entry on it (yadda-yadda) as well:
>>http://en.wikipedia.org/wiki/Central_limit_theorem  
>JC:
>The central limit theorem, which concerns sampling distribution of means based 
>on n observations, is one specific application of the fact that sums (means if 
>divided by n) of scores increasingly approximate a normal distribution, but as 
>John pointed out any score that is dependent on multiple independent 
>contributing factors will increasingly approximate a normal distribution.


Actually, I prefer how Hays (1973) describes it on page 309-310. Paraphrasing
his presentation:

Consider that one has a random variable Y which is the sum of two 
independent parts. This is represented in the equation:

(1) Y = TrueScore + Error

TrueScore is a constant but Error is a random variable that 
represents the additive effects of independent influences or "errors".
One can represent Error in the following equation:

Error = g[E1 + E2 + E3 + ... + EN)
where
g is a constant that reflects the "weight" of the error component,
E1 is the contribution of some Factor designated Factor01,
E2 is the contribution of some Factor designated Factor02,
and so on for each of relevant error factors.

If all of the Es are dichotomous variables, then the Error can be
considered to reflect the number of "successes" in N independent
trials (i.e., does the Factor make a contribution or does it fail to
make a contribution).  If N is very large, then the distribution of Error
must approach a normal distribution.  If each factor has a probability
of making an contribution of p=.50, then the mean [i.e., E(Error)]
will be zero.  That is, E(Error)= 0.00 because positive and negative
errors cancel each other out in the long run.  If these conditions are
met, then the mean of the random variable Y will be the true 
score:

E(Y) = E(T) + E(Error) = T

In the example that Jim Clark provides below, one can think of IQ 
as being synonymous with Y.
NOTE:  making the errors dichotomous simplifies matters but the errors
can be defined in different ways. 

>Hence, if IQs depend on multiple discrete observations the IQs of individuals 
>(not means of individual IQs) will be normally distributed.  

I think Jim might mean "multiple discrete influences" instead of observations.
With the definitions given above, a single IQ score will represent the sum
of influences (this an assumption; errors can have multiplicative relationships)
and the distribution of IQs or any standardized test score will form a normal 
distribution if the number of influences is large and other conditions are met.

Quoting Hays on this issue:
|...[T]he same kind of reasoning is sometime used to explain why distributions
|of natural traits, such as height, weight, and size of head, follow a more or
|less normal rule.  Here, the mean of some population is thought of as the
|'true" value, or the "norm".  However, associated with each individual
|is some departure from the norm, or error, representing the culmination
|of all of the billions of chance factors that operate on him, quite 
independently
|of other individuals..  Then by regarding these factors as generating a
|binomial distribution, we can deduce that the whole population should
|take on a form like the hypothetical normal distribution.  However, this
|is only a theory about how errors might operate and there is no reason
|at all why errors must behave in the simple additive way assumed here.
|If they do not, then the distribution need not be normal in form at all.
(Hayes, 1973, p310).

>The same holds for 
>any variable (score) with multiple contributing factors.  In the simulation, 
>for example, the central limit theorem would strictly apply if individuals had 
>dichotomous scores (e.g., dying or not, passing or not, ...) and the 
>distribution represented the sampling distribution of the dichotomous 
>observations for n individuals, either as sums as in the simulation or as 
>means 
>(equivalently proportions for dichotomous -0 1 scores) if divided by n.  If, 
>however, the dichotomous 0 1 numbers represent some underlying contributing 
>factor to the individual scores represented by the sums (or means), then the 
>results represent individual scores, which if averaged together for samples 
>would have a sampling distribution of the means of the scores for n 
>individuals.
>
>Perhaps just a subtle and esoteric distinction, but isn't that what academics 
>specialize in?

I often wonder what academics specialize in. 

-Mike Palij
New York University
m...@nyu.edu




---
You are currently subscribed to tips as: arch...@jab.org.
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=11085
or send a blank email to 
leave-11085-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Re: [tips] What's normal about the normal curve?

Reply via email to