Dear all,

I am working with taxonomic data, represented as a list of classes, orders, families, genera and finally species.

> class(mydata)
[1] "data.frame"
> mode(mydata)
[1] "list"
> names(mydata)
[1] "tclass"   "torder"   "tfamily"  "tgenus"   "tspecies"
> length(mydata$tclass)
[1] 161590

The first 10 rows look like the following:

> mydata[1:10,]
        tclass        torder            tfamily       tgenus
1  Chlorophyta Chlorophyceae     Dunaliellaceae Collodictyon
2  Chlorophyta Chlorophyceae     Dunaliellaceae Collodictyon
3  Chlorophyta Chlorophyceae     Dunaliellaceae Collodictyon
4  Chlorophyta Chlorophyceae     Dunaliellaceae   Dunaliella
5  Chlorophyta Chlorophyceae     Dunaliellaceae   Dunaliella
6  Chlorophyta Chlorophyceae     Dunaliellaceae   Dunaliella
7  Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
8  Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
9  Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
10 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
                    tspecies
1    Collodictyontriciliatum
2       Collodictyonciliatum
3   Collodictyonsemiciliatum
4           Dunaliellasalina
5         Dunaliellabardawil
6      Dunaliellatertiolecta
7      Brachiomonassubmarina
8        Brachiomonassimplex
9  Brachiomonasellipsoidalis
10      Brachiomonaswestiana

In total I have 115 (unique) classes, containing 733 orders, containing 16 185 families, etc

What I am trying to do is to obtain a subtree represented by let's say n1 random classes, containing n2 random orders (but restricted to those that belong to the classes chosen earlier), containing n3 random families etc and all the way down to species, where the number of species will be n5.

So the elements I chose at each subsequent level will be defined by elements that are already chosen at the level above. If I randomly chose lets say 3 classes A,B and C I want to restrict our randomly chosen orders (lets say a1,a2,a3, b1,b2) to only those classes that are already chosen. Similarly I also need to restrict list of families to those orders that are chosen and that are known to belong to classes A,B,C.

So I want to obtain a subtree spanning across all taxonomic levels, with randomly defined number of elements at each taxonomic level but in a such way that at the end I will not end up with orphaned nodes i.e. species without classes.

I have been trying to use 'sample' like following:

tcla<-sample(tclass,10,replace=T) #I pick 10 random elements, but I want it to be a random number; torder1<-torder[tclass==tcla] # I match list of orders with those that belong to classes defined earlier; tord<-sample(torder1, 10,replace=T) # pick 10 orders from classes that are already chosen;

etc all the way down to species level.

The problem with this approach is that I may obtain branches without any leaves. How to get rid of those branches?

And after all I want to repeat this procedure lets say 1000 times, each time obtaining different number of elements at each taxonomic level.

Sorry for this long-winded post, I hope it is clear what I am trying to do.

I would appreciate any tips!

Thanks,
Olga

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to