Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-31 Thread Kevin B. Hendricks
Hi Thomas, Here is a comparison of performance times from my own igroupSums versus using split and rowsum: > x <- rnorm(2e6) > i <- rep(1:1e6,2) > > unix.time(suma <- unlist(lapply(split(x,i),sum))) [1] 8.188 0.076 8.263 0.000 0.000 > > names(suma)<- NULL > > unix.time(sumb <- igroupSum

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-31 Thread Thomas Lumley
On Sat, 29 Jul 2006, Kevin B. Hendricks wrote: > Hi Bill, > sum : igroupSums > > Okay, after thinking about this ... > > # assumes i is the small integer factor with n levels > # v is some long vector > # no sorting required > > igroupSums <- function(v,i) { > sums <- rep(0,max(i)) > f

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-30 Thread Kevin B. Hendricks
Hi Bill, After playing with this some more and adding an implementation to handle NAs in the data vector, I have run into the problem of what to return when the only data values for a particular bin (or level) in the data vector were NAs and the user selected na.rm=T 1. Should it return 0 f

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-29 Thread Kevin B. Hendricks
Hi Bill, So you wrote one routine that can calculate any single of a variety of stats and allows weights, is that right? Can it return a data frame of any subset of requested stats as well (that is what I was thinking of doing anyway). I think someone can easily calculate all of those thin

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Kevin B. Hendricks
Hi Bill, >>>sum : igroupSums Okay, after thinking about this ... # assumes i is the small integer factor with n levels # v is some long vector # no sorting required igroupSums <- function(v,i) { sums <- rep(0,max(i)) for (j in 1:length(v)) { sums[[i[[j <- sums[[i[[j + v

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Bill Dunlap
On Fri, 28 Jul 2006, Kevin B. Hendricks wrote: > Hi Bill, > > > Splus8.0 has something like what you are talking about > > that provides a fast way to compute > > sapply(split(xVector, integerGroupCode), summaryFunction) > > for some common summary functions. The 'integerGroupCode' > > is typ

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Kevin B. Hendricks
Hi Bill, > Splus8.0 has something like what you are talking about > that provides a fast way to compute > sapply(split(xVector, integerGroupCode), summaryFunction) > for some common summary functions. The 'integerGroupCode' > is typically the codes from a factor, but you could compute > it in

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Bill Dunlap
On Fri, 28 Jul 2006, Kevin B. Hendricks wrote: > > In my particular case, the problem was my data frame had over 1 > million lines had probably over 500,000 unique sort keys (ie. think > of it as an R factor with over 500,000 levels). The implementation > of "by" uses "tapply" which in turn uses

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Martin Maechler
> "Kevin" == Kevin B Hendricks <[EMAIL PROTECTED]> > on Fri, 28 Jul 2006 14:53:57 -0400 writes: [.] Kevin> The idea is to somehow make functions that work well Kevin> over small sub- sequences of a much longer vector Kevin> without resorting to splitting the ve

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Kevin B. Hendricks
Hi, > There was a performance comparison of several moving average > approaches here: > http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html > Thanks for that link. It is not quite the same thing but is very similar. The idea is to somehow make functions that work well over small sub- sequ

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Gabor Grothendieck
There was a performance comparison of several moving average approaches here: http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html The author of that message ultimately wrote the caTools R package which contains some optimized versions. Not sure if these results suggest anything of interest her

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Kevin B. Hendricks
Hi, I was using my installed R which is 2.3.1 for the first tests. I moved to the r-devel tree (I svn up and rebuild everyday) for my "by" tests to see if it would work any better. I neglected to retest "merge" with the devel version. So it appears "merge" is already fixed and I just need

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-28 Thread Brian D Ripley
Which version of R are you looking at? R-devel has o merge() works more efficiently when there are relatively few matches between the data frames (for example, for 1-1 matching). The order of the result is changed for 'sort = FALSE'. On Thu, 27 Jul 2006, Kevin B. Hendric

Re: [Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-27 Thread Seth Falcon
"Kevin B. Hendricks" <[EMAIL PROTECTED]> writes: > My first R attempt was a simple > > # sort the data.frame gd and the sort key > sorder <- order(MDPC) > gd <- gd[sorder,] > MDPC <- MDPC[sorder] > attach(gd) > > # find the length and sum for each unique sort key > XN <- by(MVE, MDPC, length) > XS

[Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

2006-07-27 Thread Kevin B. Hendricks
Hi Developers, I am looking for another new project to help me get more up to speed on R and to learn something outside of R internals. One recent R issue I have run into is finding a fast implementations of the equivalent to the following SAS code: /* MDPC is an integer sort key made fro