Hi Thomas,
Here is a comparison of performance times from my own igroupSums
versus using split and rowsum:
> x <- rnorm(2e6)
> i <- rep(1:1e6,2)
>
> unix.time(suma <- unlist(lapply(split(x,i),sum)))
[1] 8.188 0.076 8.263 0.000 0.000
>
> names(suma)<- NULL
>
> unix.time(sumb <- igroupSum
On Sat, 29 Jul 2006, Kevin B. Hendricks wrote:
> Hi Bill,
>
sum : igroupSums
>
> Okay, after thinking about this ...
>
> # assumes i is the small integer factor with n levels
> # v is some long vector
> # no sorting required
>
> igroupSums <- function(v,i) {
> sums <- rep(0,max(i))
> f
Hi Bill,
After playing with this some more and adding an implementation to
handle NAs in the data vector, I have run into the problem of what to
return when the only data values for a particular bin (or level) in
the data vector were NAs and the user selected na.rm=T
1. Should it return 0 f
Hi Bill,
So you wrote one routine that can calculate any single of a variety
of stats and allows weights, is that right? Can it return a data
frame of any subset of requested stats as well (that is what I was
thinking of doing anyway).
I think someone can easily calculate all of those thin
Hi Bill,
>>>sum : igroupSums
Okay, after thinking about this ...
# assumes i is the small integer factor with n levels
# v is some long vector
# no sorting required
igroupSums <- function(v,i) {
sums <- rep(0,max(i))
for (j in 1:length(v)) {
sums[[i[[j <- sums[[i[[j + v
On Fri, 28 Jul 2006, Kevin B. Hendricks wrote:
> Hi Bill,
>
> > Splus8.0 has something like what you are talking about
> > that provides a fast way to compute
> > sapply(split(xVector, integerGroupCode), summaryFunction)
> > for some common summary functions. The 'integerGroupCode'
> > is typ
Hi Bill,
> Splus8.0 has something like what you are talking about
> that provides a fast way to compute
> sapply(split(xVector, integerGroupCode), summaryFunction)
> for some common summary functions. The 'integerGroupCode'
> is typically the codes from a factor, but you could compute
> it in
On Fri, 28 Jul 2006, Kevin B. Hendricks wrote:
>
> In my particular case, the problem was my data frame had over 1
> million lines had probably over 500,000 unique sort keys (ie. think
> of it as an R factor with over 500,000 levels). The implementation
> of "by" uses "tapply" which in turn uses
> "Kevin" == Kevin B Hendricks <[EMAIL PROTECTED]>
> on Fri, 28 Jul 2006 14:53:57 -0400 writes:
[.]
Kevin> The idea is to somehow make functions that work well
Kevin> over small sub- sequences of a much longer vector
Kevin> without resorting to splitting the ve
Hi,
> There was a performance comparison of several moving average
> approaches here:
> http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html
>
Thanks for that link. It is not quite the same thing but is very
similar.
The idea is to somehow make functions that work well over small sub-
sequ
There was a performance comparison of several moving average
approaches here:
http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html
The author of that message ultimately wrote the caTools R package
which contains some optimized versions.
Not sure if these results suggest anything of interest her
Hi,
I was using my installed R which is 2.3.1 for the first tests. I
moved to the r-devel tree (I svn up and rebuild everyday) for my "by"
tests to see if it would work any better. I neglected to retest
"merge" with the devel version.
So it appears "merge" is already fixed and I just need
Which version of R are you looking at? R-devel has
o merge() works more efficiently when there are relatively few
matches between the data frames (for example, for 1-1
matching). The order of the result is changed for 'sort = FALSE'.
On Thu, 27 Jul 2006, Kevin B. Hendric
"Kevin B. Hendricks" <[EMAIL PROTECTED]> writes:
> My first R attempt was a simple
>
> # sort the data.frame gd and the sort key
> sorder <- order(MDPC)
> gd <- gd[sorder,]
> MDPC <- MDPC[sorder]
> attach(gd)
>
> # find the length and sum for each unique sort key
> XN <- by(MVE, MDPC, length)
> XS
Hi Developers,
I am looking for another new project to help me get more up to speed
on R and to learn something outside of R internals. One recent R
issue I have run into is finding a fast implementations of the
equivalent to the following SAS code:
/* MDPC is an integer sort key made fro
15 matches
Mail list logo