On: Sat, 8 Apr 2017 00:42:52 +0000, , Lenore Frigo wrote:
For those of you who teach lower-division introduction to
research methods (or have an opinion on what we SHOULD
be teaching at that level):

In teaching students how to interpret statistical results,
such as a t-test, do you think it's important to have them
find the critical value on a table and proceed from there,
or just start with a "print out" of the results that would
already include the actual p value?

Though not immediately obviously, the question you ask
turns out to be far more complex because of a variety of
factors.  I try to identify some of these in the points below:

(1)  The information that you have provicded does not make
clear whether you students understand the concept of a
sampling distribution of a statistics and why one is looking
to determine whether the value of an obtained statistic is
is consistent or not with the null hypothesis.  Presumably,
in the case of the t-test you are mostly concerned with
rejecting the null hypothesis (in contrast to situations where
you are doing a variation of goodnees of fit between, say,
some criterion value and your obtained value, for example
comparing the mean IQ of a sample to a value of 100
[the value of the population mean]).  If you teach the
goodness of fit chi-square for a frequency distribution,
the emphasis is on NOT rejecting the null (which might
claim that sample frequency distribution is consistent
with a normal distribution).

(2) There are phiilosophical (metaphysical?) issues involves
as well depending upon whether one considers oneself to
be either a Fisherian/Neofisherian when it comes to
statistical inference or one subscribes to the model promoted
by the Neyman-Pearson approach (which includes concepts
such as Type II errors, statistical power, and so on). As Gerd
Gigerenzer argued so long ago, most psychologist were taught
a mash-up of these two inference frameworks which has resulted
in neither being used corrested.  With respect to the use of a
table of "critical value" or probabilities (p-values), the position
that one subscribes partlyy determines what one does.
Consider the following situation:

(a) A researcher that take a Fisherian approach to statistical
testing will conduct their research and after the data is in,
analyze it to determine whether test results are either consistent
with the null hypothesis (e.g., p(obtain test value) > .05) or
inconsistent with it (i.e., p(obtained test value)  < .05). Here
a table or some other source of critical test values or the
probability of the of the obtained test rest if the null hypothesis
is true plays a critical role because one is in essence just
deciding whether the obtained value should lead one to reject
the null hypothesis or fail to reject it.  One might go on to do
some additional analyses but the most critical question that
one is interested in should have been answered.

(b) In the Neyman-Pearson approach, we need to distinguish
between whether one is an "a priorist" or a "post hoc" practioner.
The distinction between the two is illustrated in the following:

(i) The proper way to do a statistical analysis in the Neyman-Pearson
framework requires one to identify what population distributions
are involved in the null and alternative hypotheses.  In addition,
if one is going to conduct a two-sample between-subjects
experiment (NOTE: the tense implies that all this is done before
any data is collected), and one will use an independent groups
t-tests as the tool of analysis, there the following consideration
or decision one must make:

- What is the Type I Error rate that one will use? Typically,
we use alpha - .05 but, if we expect to be doing several test,
we may wish to use a Bonferroni correction so that the
overall Type I error rte is equal to .05 but this means that
the individual tests will use an alpha < .05 (examination of
a table of Bonferroni t-test values corrected for the number
of tests done will identify the relevant critical values but these
usually assume that one divides the Type I error equally
over each test, so, if one is doing 3 t-tests, the the per comparison
alpha will be .05/3 = 0.01667; however, a research can decide
to assign a Type I error rate = .02 to two tests and .01 to the
third test for whatever reason [researchers do the darnedest
things] but I have not seen any tables like this which means
that one has to get the critical values either through hand calculation
or some software program if one really is going to focus on the
value of the test statistic instead of its p-value)..

- What is the Effect Size (ES) that one is trying to detect?
The ES in the independent groups t-test situation represents the
standarding distance between the populations means of the
distributions specified in the alternative hypothesis.  This
difference, represented by delta in the population and d in the
sample, is required to figure out other aspects of the test but
it also represents what one knows about the phenomenon
being studied, especially at level of population distributions
for the dependent variable that one is using.

- What is the sample size N that one needs to use?  In part,
this will be determine by: the magnitude of the ES, the level
of statistical power one has, and practical considerations
such as whether one get the appropriate number of subjects
or participants.  One would often lover to have "YUGE" ES
because that means that one will detect the difference even
with small samples with a reasonable level of statistical power.
Alas, as Jack Cohen has showed time ant again (e.g., see
his article in Psych Bull titled "Power Primer") one can
assume that there a potentially 3 levels of effect size (i.e.,
small, moderate, and large) and the smaller the eS, the
more subjects/participants one needs to use to achieve
a reasonable level of statistical power.

- What level of statistical power is one will to accept? The
traditional level of statistical power that has been suggested
is .80, that is, there is an 80% chance that a false nyll hypothesis
will be rejected. Unfortunately, the complement of statistical
power is the Type II error rate: the probability that one will FAIL
to reject a false null hypothesis -- in thie situation the Type II
error rate beta = .20, meaning that one will fail to reject a false
null hypothesis 20% of the time.  This raises the classic argument
of which is more important: Type I or Type II errors?  One really
should assign costs to each to determine what the cost benefit
value is of using a particular Type II error rate (since Type I
error rates are fixed at .05 or less).

Now the 4 components above (Tyle I error rate alpha, ES,
sample size, and statistical power) should be decided upon
BEFRE ANY DATA ARE COLLECTED.  When I used to
serve a statistical consultant for researcher writing grants to
federal funders, it was at a time when the fed required that
all research proposals HAD TO HAVE a power analysis in
it, which was determined by the values that one chose for
alpha, ES, sample size.  The biggest problem researchers
had (outside of uderstanding what stiatical power was) was
what the value of ES should be. Often they didn't have a clue
and I'd have to do a tutorial on how power, ES, and sample
size were interrelated and the resoluation often came done
to determining how subjects/participants one could afford
to use in their study.  With sample size N fixed, one could
play around with values of ES and power to get the most
"socially acceptable" combination of vlues (i.e., moderate
ES, power above .80).

As mentioned earlier, all this is decided upon before
any data is collected, consequently, the research has
a pretty good idea of the critical values one needs to
transcend to claim statistical significance -- one won't
use a table in this siutation because one would ahve
used software to do the pwer analysis and this can
provide critical values.

Of course, what I describe above, the "a priorist" version
of the Neyman-Pearson framework is what has to be
done in the real world, especially when one is submitting
a grant proposal to a federal funder (though this may
change under the current administration).  It has been
my experience that few researcher and teachers know
enough about the relevant population distributions that
are involved in their research and statistical analyses
that need to be done, so, a lot of the writing that results
is more like speculative fiction (i.e., science fiction)
rather than actual rational analysis. I highlight the
research proporsal to federal sources because the
funders really either (i) what this information, and/or
(ii) wnat to see if the researcher is capable of doing
an power analysis. Researchers doing their own
research without funding, I think, don't bother with
doing all this a priori research for a variety of reasons.
But if they do this, then the final use of a table of
critical vlues is kind of irrelevant.  I have the feeling
that researchers who teach and don't do the a priori
work will have to use a critical value table, partly
because that's what they do, and because that is
what most understand psych statisticals books do.

(3) The problem of doing a retrospective power analysis;
if a researcher doesn't do an aprior power analysis because
they simply don't know all that much about the situation in
population(s) but is still a commited Neyman-Pearson
advocate, they might doa restrospective power analyis
after the data has been collected.  The data will provide
information about the obtained effect size,,the level
of statistical power, and whether the sample size was
adepquate. Unfortunately, a number of reasearcher/statistician
do not believe that this is a valid form of analysis because
it is based on sample information instead of population
information and is subject to sampling error and capitalization
on chance (a search of google scholar on this topic will provide
a fair number of hits -- many people are often surprised to
learn this). If one is actually doing analyses by hand, then
tables will be critical and students should be taught how to
use them if they will emulate the behavior of the researcher.
I will leave it to the reader to decide how reasonable doing
this is.

More below.

Currently I have them work with the table, but it seems
old-fashioned and unnecessarily cumbersome. On the
other hand, using the table forces them to perhaps have
a bit more conceptual understanding of what they are doing.

From what I have written above, ti should be clear that
researcher or student or anyone really needs to needs to
know a lot about the population situations as well as the
sampling distributions relevant to the testing they want
to do.  If they don't have this knowledge, well, then they
are just engaging in ritualsistic behavior.  Teaching then
become "we do things this way because this is what
we always do."

On a more practical level, if one is teaching APA style for
writing, then one has a decision to make.  If one is a
Fisherian, then the simple p< .05 or p> .05 is sufficient
because one is just reporting the basis for rejecting or
failing the null hypothesis.  A table is adequate to make
this decision.  However, if one is a Neyman-Pearson
disciple and one want to follow the APA commitee on
statistical practice and reporting recommendations,
then one will want to report the EXACT p-value associated
associated with the test done but a printed table usually
does not contain this information -- one cannot report that
a t-test result has a p= 0.036.  The mention of "printout"
suggests the use of some unspecified software package
may be used but not specifying the package is a problem.
Consider: Microsoft Excel is almost universally avaiable
and using the add-in "statistical toolpak" or some (more
accurate and powerfult) third party add-in will allow one to
do most of the statistical analyses covered in the "traditional"
psychological statistics textbook.  So, for the independent
groups t-test (either equal variance or unequal variances0,
Excel provide the critical t-value for both 1-tailed and 2-tailed
tests as well as the obtain t-test p-value (a little redundant)
which makes the use of a table somewhat irrelevant (on
quizzes and exams, one can provide the appropriate critical
t-value in the question, so a t-table doesn't have to be
available).

If one is using SPSS, usually just the p-value of the obtained
statistic is provided so that one has to use the rule:

If p(obtained statistics) < .05, then reject the null hypothesis.

This is in contrast to the rule that is used if one is using a
table or is provided a critical value: (assuming a 2-tailed
t-test):

If | obtained t-value | > | critical t-value |, then reject null hypothesis.

The vertical lines indicate that one should use the absolute value
of the statistic.

Use of tables implies that one is following the latter rule instead
of the former rule.  But this might cause problems if one want
to follow the APA style recommendations for exact p-values.
It should be noted that SPSS often truncates p-value if they
have more than three zeros (i.e., if p=.00005, SPSS prints
p = .000 which many students find confusing).  In contrast,
Excel provides many more decimal value though for very
small probabilitites, this is expressed in scientific notation
(e.g., p = .00000000912 becomes 9.12E-9) which some
may consider to be more informative.  Alas, APA style also
recommends that p values less than .001 should be
reported as p < .001 which kind of undermines the recommendations
of APA statistics committee but no one ever said that
APA style was supposed to make sense. ;-)

So, should one use a table of critical values when teaching
psych stats?  That will depend upon how one teachers, what
one knows about the phenomena one will be using as
examples to analyze, whether one is a Fisherian or
Neyman-Pearsonian in perspective (but let's not forget
the Bayesians and other "fringe" groups ;-), and the
degree to which wants to follow APA style. Another
consideration is that if one doesn't use tables then
one might use computer software or a website app to
get the relevant information.  Of course, in the case of
a electrical outage or a zombie apocalypse or some
other similar catastrophe, computers and the web may
not be available and books will be all that we have.
So, being able to read stiatistical table is problably a
useful skill to have as one is running away fomr zombies. ;-)

Hope this helps.

-Mike Palij
New York University
[email protected]

P.S. NOTE: I wrote tne above this morning before my
first cup of coffee and it is possible that I may have fallen
asleep but continuing to type but generated gibberish.
If so, just point out the relevant passages and I'll try to
figure out what I meant to say (hopefully nothing that
is deeply dark and heavily repressed in my memory ;-).

All input and opinions most welcome,

P.P.S. Careful what you wish for. ;-)


---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=50653
or send a blank email to 
leave-50653-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Reply via email to