Hello, I'm an archaeologist who works on artifacts in North America. There are not many of us that use LGM, but even we can't seem to agree on how many LMs are appropriate. Because I use discriminant function analysis as the workhorse for discriminating groups of artifacts, I worry about the misuse of that technique. One thing I've read (e.g., Qiao et al. 2009) in regards to DFA is that too many variables (LMs) can affect its discriminatory power through data piling or the related phenomenon of overfitting. I have seen this in my practice but have not tested it rigorously. By reducing the number of LMs, I can sometimes get better discrimination between groups.
Numbers of artifacts (specimens) is not a problem. I'm about to embark on a regional analysis using 1000's. Does anyone who understands this phenomenon better than I do care to comment? Thanks, Dave Thulman On Fri, Jun 2, 2017 at 6:12 PM, mitte...@univie.ac.at <mitte...@univie.ac.at > wrote: > I think a few topics get mixed up here. > > Of course, a sample can be too small to be representative (as in Andrea's > example), and one should think carefully about the measures to take. It is > also clear that an increase in sample size reduces standard errors of > statistical estimates, including that of a covariance matrix and its > eigenvalues. But, as mentioned by Dean, the standard errors of the > eigenvalues are of secondary interest in PCA. > > If one has a clear expectation about the signal in the data - and if one > does not aim at new discoveries - a few specific measurements may suffice, > perhaps even a few distance measurements. But effective exploratory > analyses have always been a major strength of geometric morphometrics, > enabled by the powerful visualization methods together with the large > number of measured variables. > > Andrea, I am actually curious what worries you if one "collects between > 2700 and 10 400 homologous landmarks from each rib" (whatever the term > "homologous" is supposed to mean here)? > > Compared to many other disciplines in contemporary biology and > biomedicine, a few thousand variables are not particularly many. Consider, > for instance, 2D and 3D image analysis, FEA, and all the "omics", with > millions and billions of variables. In my opinion, the challenge with these > "big data" is not statistical power in testing a signal, but finding the > signal - the low-dimensional subspace of interest - in the fist place. But > this applies to 50 or 100 variables as well, not only to thousands or > millions. If no prior expectation about this signal existed (which the mere > presence of so many variables usually implies), no hypothesis test should > be performed at all. The ignorance of this rule is one of the main reasons > why so many GWAS and voxel-based morphometry studies fail to be replicable. > > Best wishes, > > Philipp > > -- > MORPHMET may be accessed via its webpage at http://www.morphometrics.org > --- > You received this message because you are subscribed to the Google Groups > "MORPHMET" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to morphmet+unsubscr...@morphometrics.org. > -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.