Hi First, thanks to Mike for taking the time to track down this information. Just a couple of points ... I've reordered relevant parts of Mike's posting (prefaces by MP:) before my comments (prefaced by JC:). [apologies if this is duplicate or triplicate or ... I've had to send it a number of times because of some computer glitch]
MP: Note that 4.5% of this group have doctorates. In previous posts on this topic, estimates of the percentage in the general population were calculated using the Census* Community Survey data. In retreospect, this is the wrong calculation to do, that is, one should not take the number of Ph.D. estimated in the population and divide it by the total number of people in the sample. This does give one the percentage of the general population that have Ph.D. but for purposes of comparison, the denominator should the number of people between 24 to 94 years of age, that age range of the richest groups. Children, which would be included in the total sample number will inflate the denominator and not provide the appropriate number for comparison. In other words, to determine whether the 4.5% of Ph.D.s in this richest group is an *overrepresentation* or *underrepresentation* requires one to compare 4.5% to the percentage of Ph.D.s in the age range of 24 to 94 (excluding the richests). JC: The tables Mike and I used earlier DO limit the denominator to adults (18 and over or 25 and over in the case of Mike's earlier estimate of .0125). So the earlier estimates hold. MP: (3) Given that this dataset represents that richest 400minus2 people in the U.S. in 2008 and under the assumption that is exhaustive, this group is not a sample but a population. Consequently, the usual tests of statistical significance would not apply (e.g., testing whether the correlation between networth in $billions and educational level is zero or not would not be appropriate since we are dealing with the population rho and not the sample r). Bootstrapping and re-sampling techniques can be used to estimate standard errors for various statistics/parameters but one would do so under specific explicit assumptions. Note also that the usual formula for the variance and standard deviation which correct for sample estimates/sampling error would provide overestimates of the true variance and standard deviation JC: But some statistical tests would be valid, such as the likelihood of getting 18 or more PhDs among 400 billionaires if p = .0125. Although the current proportion of .045 is close to that of the earlier 100 billionaires, the statistical probability is MUCH reduced because of the larger group. Below is the exact probabilities of 0 to 20 or more PhDs in a group of 400 if p = .0125. The likelihood of 18 or more PhDs is extremely small, .0000011. Indeed the chance of just 9 or more PhDs is less than .05. I used SPSS to generate these exact probabilities, but it might be interesting to use the normal approximation as well. x px cpx upx 0 .0065289 .0065289 .9934711 1 .0330579 .0395868 .9604132 2 .0834815 .1230683 .8769317 3 .1401926 .2632609 .7367391 4 .1761281 .4393890 .5606110 5 .1765740 .6159630 .3840370 6 .1471450 .7631079 .2368921 7 .1048375 .8679454 .1320546 8 .0651917 .9331371 .0668629 9 .0359425 .9690796 .0309204 10 .0177893 .9868688 .0131312 11 .0079837 .9948525 .0051475 12 .0032760 .9981285 .0018715 13 .0012377 .9993662 .0006338 14 .0004331 .9997993 .0002007 15 .0001411 .9999403 .0000597 16 .0000430 .9999833 .0000167 17 .0000123 .9999956 .0000044 18 .0000033 .9999989 .0000011 19 .0000008 .9999997 .0000003 20 .0000002 .9999999 .0000001 MP: (3) Mean Networth in $Billions for each level of education: using the Degree.2 above (separates MA/MS from MBA), here are the descriptive statistics (standard errors are provided but they may not be meaningful): Estimates for NetWorth$Bil Degree.2 Mean Std.Er 00 High School 6.076 0.776 10 Associate 2.600 3.680 20 Bachelors 3.330 0.404 30 Masters 8.817 1.227 31 MBA 3.545 0.575 40 MD or JD 3.389 0.855 50 Doctorate 3.189 1.227 JC: As Mike correctly notes, this is an excellent dataset for making some good points in statistics (and other) classes. One such point might be about restriction of range. As noted by Rick, we are looking at a tiny proportion of the population defined by the very, very highest of incomes. Is it reasonable to expect any relationship with such a restricted sample/population? Again, thanks to Mike P for taking the time. Take care Jim James M. Clark Professor of Psychology 204-786-9757 204-774-4134 Fax j.cl...@uwinnipeg.ca Department of Psychology University of Winnipeg Winnipeg, Manitoba R3B 2E9 CANADA --- To make changes to your subscription contact: Bill Southerly (bsouthe...@frostburg.edu)