[R-sig-eco] Clustering large data

2008-10-07 Thread ONKELINX, Thierry
Dear all, We have a problem with a large dataset that we want to cluster. The dataset is in a long format: 1154024 rows with presence data. Each row has the name of the species and the location. We have 1381 species and 6354 locations. The main problem is that we need the data in wide format (one

Re: [R-sig-eco] Clustering large data

2008-10-07 Thread Peter Solymos
Dear Thierry, the 'mefa' package should do this, and I am also interested in the testing of the package for such a large number of species. I have used it before with 75K records, but only with ~160 species and 1052 sites. So please let me know if it worked! You can do the clustering like this (S

Re: [R-sig-eco] Clustering large data

2008-10-07 Thread Farrar . David
Thierry, Search of CRAN with "sparse clustering" yielded cluster.dist {cba}, defined as "Clustering a Sparse Symmetric Distance Matrix". There were also sparse PCA packages and sparse matrix classes. I have no experience with these procedures. As additional background, you might like to s

Re: [R-sig-eco] Clustering large data

2008-10-07 Thread tyler
"ONKELINX, Thierry" <[EMAIL PROTECTED]> writes: > Dear all, > > We have a problem with a large dataset that we want to cluster. The > dataset is in a long format: 1154024 rows with presence data. Each row > has the name of the species and the location. We have 1381 species and > 6354 locations. >

Re: [R-sig-eco] Clustering large data

2008-10-07 Thread Christian A. Parker
This method for converting long to wide format seems to work well with pretty large datasets and it uses only base functions. # this function will return a site*species matrix # based on the formula variable. Data does not need # to be grouped, the xtabs function will take care of # summing any

Re: [R-sig-eco] Clustering large data

2008-10-07 Thread Farrar . David
table) mx <-as.matrix(tbl) if (relativize==T) {mx<-mx/rowSums(mx)} return(mx) } "Christian A. Parker" <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 10/07/2008 11:04 AM To "ONKELINX, Thierry" <[EMAIL PROTECTED]> cc r-sig-ecology@r-project.org Subject

Re: [R-sig-eco] Clustering large data

2008-10-07 Thread Brian Campbell
te: Tue, 7 Oct 2008 09:56:15 -0400 > CC: [EMAIL PROTECTED]; r-sig-ecology@r-project.org > Subject: Re: [R-sig-eco] Clustering large data > > Thierry, > > Search of CRAN with "sparse clustering" yielded cluster.dist {cba}, > defined as "Clustering a Sparse S

Re: [R-sig-eco] Clustering large data

2008-10-07 Thread Christian A. Parker
Sent by: [EMAIL PROTECTED] 10/07/2008 11:04 AM To "ONKELINX, Thierry" <[EMAIL PROTECTED]> cc r-sig-ecology@r-project.org Subject Re: [R-sig-eco] Clustering large data This method for converting long to wide format seems to work well with pretty large datasets and it uses onl

Re: [R-sig-eco] Clustering large data

2008-10-10 Thread ONKELINX, Thierry
data. ~ John Tukey -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Peter Solymos Verzonden: dinsdag 7 oktober 2008 15:51 Aan: r-sig-ecology@r-project.org Onderwerp: Re: [R-sig-eco] Clustering large data Dear Thierry, the 'mefa' package should do this, and I am al

Re: [R-sig-eco] Clustering large data

2008-10-10 Thread ONKELINX, Thierry
pronkelijk bericht- Van: hadley wickham [mailto:[EMAIL PROTECTED] Verzonden: vrijdag 10 oktober 2008 14:40 Aan: ONKELINX, Thierry CC: Peter Solymos; r-sig-ecology@r-project.org Onderwerp: Re: [R-sig-eco] Clustering large data > Thanks for your responses. The biggest problem seems to be cast() for

Re: [R-sig-eco] Clustering large data

2008-10-10 Thread Farrar . David
ROTECTED] 10/10/2008 10:12 AM To "hadley wickham" <[EMAIL PROTECTED]> cc r-sig-ecology@r-project.org Subject Re: [R-sig-eco] Clustering large data Hi Hadley, R ran out of memory. I got the "can't allocate vector of xxx mb" type of error. I did something lik

Re: [R-sig-eco] Clustering large data

2008-10-13 Thread hadley wickham
> Thanks for your responses. The biggest problem seems to be cast() for > the reshape package which could not handle the dataset. Peter's solution > using the mefa package worked fine. I found an other solution: table() > which works fine to crosstabulate presence-only data. Exactly what error did

Re: [R-sig-eco] Clustering large data

2008-10-14 Thread ONKELINX, Thierry
a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens hadley wickham Verzonden: vrijdag 10 oktober 2008 14:40 Aan: ONKELINX, Thierry CC: r-sig-ecology@r-project.org Onderwerp: Re: [R-sig-eco] Clustering large data > Thanks for

Re: [R-sig-eco] Clustering large data

2008-10-15 Thread hadley wickham
ment died of. > ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. > ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of > data. > ~ John Tukey &

Re: [R-sig-eco] Clustering large data

2008-10-24 Thread Dave Roberts
body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens hadley wickham Verzonden: vrijdag 10 oktober 2008 14:40 Aan: ONKELINX, Thierry CC: r-sig-ecology@r-project.org Onderwerp: Re: [R-sig-eco] Clustering large data Thanks for your respo

Re: [R-sig-eco] Clustering large data

2008-10-27 Thread ONKELINX, Thierry
d from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: Dave Roberts [mailto:[EMAIL PROTECTED] Verzonden: vrijdag 24 oktober 2008 20:11 Aan: r-sig-ecology@r-project.org CC: ONKELINX, Thierry Onderwerp: Re: [R-sig-eco] Clustering large data Thierry and Hadley,