Re: [R] post

2010-09-13 Thread Hadley Wickham
Have a look at: "Computing Thousands of Test Statistics Simultaneously in R" by Holger Schwender and Tina Müller, in http://stat-computing.org/newsletter/issues/scgn-18-1.pdf Hadley On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush wrote: > Hello, > > I have a question regarding how to speed up the t

Re: [R] Problems with reshape2 on Mac

2010-09-13 Thread Hadley Wickham
Hi Uwe, The problem is most likely because the original poster doesn't have the latest version of plyr. I correctly declare this dependency in the DESCRIPTION (http://cran.r-project.org/web/packages/reshape2/index.html), but unfortunately R doesn't seem to use this information at run time, genera

Re: [R] Which language is faster for numerical computation?

2010-09-10 Thread Hadley Wickham
On Fri, Sep 10, 2010 at 10:23 AM, Henrik Bengtsson wrote: > Don't underestimate the importance of the choice of the algorithm you > use.  That often makes a huge difference.  Also, vectorization is key > in R, and when you use that you're really up there among the top > performing languages.  Her

Re: [R] Data.frames : difference between x$a and x[, "a"] ? - How set new values on x$a with a as variable ?

2010-09-10 Thread Hadley Wickham
>>> I think this is a really bad idea. data.frames are not meant to be >>> used in this way. Why not use a list of lists? >> >> It can be very convenient, but I suspect the original poster is >> confused about the different between vectors and lists. > > I wouldn't be surprised if someone were conf

Re: [R] Data.frames : difference between x$a and x[, "a"] ? - How set new values on x$a with a as variable ?

2010-09-10 Thread Hadley Wickham
>>> I'm having trouble parsing this. What exactly do you want to do? >> 1 - Put a list as an element of a data.frame. That's quite convenient for my >> pricing function. > > I think this is a really bad idea. data.frames are not meant to be > used in this way. Why not use a list of lists? It can

[R] [R-pkgs] reshape2: a reboot of the reshape package

2010-09-10 Thread Hadley Wickham
Reshape2 is a reboot of the reshape package. It's been over five years since the first release of the package, and in that time I've learned a tremendous amount about R programming, and how to work with data in R. Reshape2 uses that knowledge to make a new package for reshaping data that is much mo

[R] [R-pkgs] plyr: version 1.2

2010-09-10 Thread Hadley Wickham
plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data f

Re: [R] Strange output daply with empty strata

2010-09-09 Thread hadley wickham
> daply(data.test, .(municipality, employed), function(d){mean(d$age)} ) >     employed > municipality   no  yes >    A 41.58759 44.67463 >    B 55.57407 43.82545 >    C 43.59330   NA > > The .drop argument has a different meaning in daply. Some R functio

Re: [R] adding list to data.frame iteratively

2010-09-08 Thread Hadley Wickham
Why don't you read the answers to your stackoverflow question? http://stackoverflow.com/questions/3665885/adding-a-list-of-vectors-to-a-data-frame-in-r/3667753 Hadley On Wed, Sep 8, 2010 at 1:17 AM, raje...@cse.iitm.ac.in wrote: > > Hi, > > I have a preallocated dataframe to which I have to add

Re: [R] Aggregating data from two data frames

2010-09-08 Thread Hadley Wickham
Have a look at match and merge. Hadley On Wednesday, September 8, 2010, Michael Haenlein wrote: > Dear all, > > I'm working with two data frames. > > The first frame (agg_data) consists of two columns. agg_data[,1] is a unique > ID for each row and agg_data[,2] contains a continuous variable. > >

Re: [R] Please explain "do.call" in this context, or critique to "stack this list faster"

2010-09-04 Thread Hadley Wickham
> One common way around this is to pre-allocate memory and then to > populate the object using a loop, but a somewhat easier solution here > turns out to be ldply() in the plyr package. The following is the same > idea as do.call(rbind, l), only faster: > >> system.time(u3 <- ldply(l, rbind)) >   u

[R] [R-pkgs] testthat: version 0.3

2010-09-01 Thread Hadley Wickham
# testthat Testing your code is normally painful and boring. `testthat` tries to make testing as fun as possible, so that you get a visceral satisfaction from writing tests. Testing should be fun, not a drag, so you do it all the time. To make that happen, `testthat`: * Provides functions that ma

Re: [R] how to replace NA with a specific score that is dependant on another indicator variable

2010-09-01 Thread Hadley Wickham
> first ddply result did I see that some sort of misregistration had occurred; > Better with: > > res <-ddply(egraw2, .(category), .fun=function(df) { >               sapply(df, >                    function(x) {mnx <- mean(x, na.rm=TRUE); >                                 sapply(x, function(z) if

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
>> That doesn't justify the use of a _histogram_  - and regardless of > > The usage highlights meaningful characteristics of the data. > What better justification for any method of analysis and display is > there? That you're displaying something that is mathematically well founded and meaningful

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
> I have counts ranging over 4-6 orders of magnitude with peaks > occurring at various 'magic' values.  Using a log scale for the > y-axis enables the smaller peaks, which would otherwise > be almost invisible bumps along the x-axis, to be seen That doesn't justify the use of a _histogram_ - and

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
It's not just that counts might be zero, but also that the base of each bar starts at zero. I really don't see how logging the y/axis of a histogram makes sense. Hadley On Sunday, August 29, 2010, Joshua Wiley wrote: > Hi Derek, > > Here is an option using the package ggplot2: > > library(ggplot

[R] [R-pkgs] stringr: version 0.4

2010-08-25 Thread Hadley Wickham
Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they

Re: [R] Plot bar lines like excel

2010-08-25 Thread Hadley Wickham
On Wed, Aug 25, 2010 at 6:05 AM, abotaha wrote: > > Woow, it is amazing, > thank you very much. > yes i forget to attach the dates, however, the dates in my case is every 16 > days. > so how i can use 16 day interval instead of month in by option. Here's one way using the lubridate package: libr

Re: [R] Comparing/diffing strings

2010-08-24 Thread Hadley Wickham
On Tue, Aug 24, 2010 at 11:25 AM, Martin Morgan wrote: > On 08/24/2010 07:27 AM, Doran, Harold wrote: >> There is the stringMatch function in the MiscPsycho package. >> >>> stringMatch('Hadley', 'Hadley Wickham', normalize = 'no') >>

Re: [R] change order of plot panels in faceted ggplot/qplot

2010-08-24 Thread Hadley Wickham
On Mon, Aug 23, 2010 at 1:02 PM, Alison Macalady wrote: > Hi, > > I have a 5-paneled figure that i made using the facet function in qplot > (ggplot).  I've managed to arrange the panels into two rows/three columns, > but for the sake of easy visual comparisons between panels in my particular > dat

[R] Comparing/diffing strings

2010-08-24 Thread Hadley Wickham
Hi all, all.equal is generally very useful when you want to find the differences between two objects. It breaks down however, when you have two long strings to compare: > all.equal(a, b) [1] "1 string mismatch" Does any one know of any good text diffing tools implemented in R? Thanks, Hadley

Re: [R] Recyclable

2010-08-23 Thread Hadley Wickham
I should note that I realise this function is pretty trivial to write (see below), I just want to avoid reinventing the wheel. recyclable <- function(...) { lengths <- vapply(list(...), length, 1) all(max(lengths) %% lengths == 0) } Hadley On Mon, Aug 23, 2010 at 10:33 AM, Hadley W

[R] Recyclable

2010-08-23 Thread Hadley Wickham
Hi all, Is there a function to determine whether a set of vectors is cleanly recyclable? i.e. is there a common function for detecting the error/warnings that underlie the following two function calls? > 1:3 + 1:2 [1] 2 4 4 Warning message: In 1:3 + 1:2 : longer object length is not a multiple

Re: [R] problems with merge() - the output has many repeated lines

2010-08-21 Thread Hadley Wickham
You may find a close reading of ?merge helpful, particularly this sentence: "If there is more than one match, all possible matches contribute one row each" (so check that you don't have multiple matches). Hadley On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo wrote: > Hi everyone, > > I have be

Re: [R] drawing dot plots with size, shape affecting dot characteristics

2010-08-12 Thread Hadley Wickham
On Wed, Aug 11, 2010 at 10:14 PM, Brian Tsai wrote: > Hi all, > > I'm interested in doing a dot plot where *both* the size and color (more > specifically, shade of grey) change with the associated value. > > I've found examples online for ggplot2 where you can scale the size of the > dot with a va

Re: [R] ggplot2 histograms... a subtle error found

2010-08-09 Thread hadley wickham
> When ggplot2 verifies the widths before stacking (the default position for > histograms), it computes the widths from the minimum and maximum values for > each bin.  However, because the width of the bins (0.28) is much smaller > than the scale of the edges (6.8e+09), there is some underflow and

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
On Mon, Aug 9, 2010 at 4:30 PM, Matthew Dowle wrote: > > > Another option for consideration : > > library(data.table) > mydt = as.data.table(mydf) > > mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac] >     fac X.Intercept.       x1       x2        x3 > [1,]   0  -0.16247059 1.130220 2.988769 -19.14719

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
>> That's exactly what dlply does - so you should never have to do that >> yourself. > > I'm unclear what you are saying. Are you saying that the plyr function > _should_ have examined the objects in that list and determined that there > were 4 rows and properly labeled the rows to indicate which l

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
> There is one further improvement to consider. When I tried using dlply to > tackle a problem on which I had been bashing my head for the last three days > and it gave just the results I had been looking for, I also noticed that the > dlply function returns the grouping variable levels in an attri

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
On Mon, Aug 9, 2010 at 9:29 AM, David Winsemius wrote: > If you look at the output (as I did)  you should see that despite whatever > expectations you have developed regarding plyr, that it did not produce a > grouping variable: > >> ldply(dl, function(x) coef(summary(x)) ) >   fac    Estimate Std

Re: [R] image plot but data not on grid.

2010-08-09 Thread Hadley Wickham
With sweave, you need to explicitly print() the output of ggplot2 and lattice plots. Hadley On Mon, Aug 9, 2010 at 6:32 AM, W Eryk Wolski wrote: > qplot does (?) what I was looking for! > At least it plots what I want to plot in the interactive modus. > However, it seems not to work with Sweave.

Re: [R] image plot but data not on grid.

2010-08-07 Thread Hadley Wickham
On Sat, Aug 7, 2010 at 2:54 AM, Michael Bedward wrote: > On 7 August 2010 06:26, Hadley Wickham wrote: > >> library(ggplot2) >> qplot(x, y, fill = z, data = df, geom = "tile") > > Hi Hadley, > > I read the original question as being about irregularly spac

Re: [R] image plot but data not on grid.

2010-08-06 Thread Hadley Wickham
On Fri, Aug 6, 2010 at 9:24 AM, W Eryk Wolski wrote: > Hi, > > Would like to make an image > however the values in z are not on an uniform grid. > > Have a dataset with > length(x) == length(y) == length(z) > x[1],y[1] gives the position of z[1] > > and would like to encode value of z by a color.

Re: [R] looking for setdiff equivalent on dataset

2010-07-29 Thread Hadley Wickham
> Well, here's one way that "might" work (explanation below): > > The ideas is to turn each row into a character vector and then work with the > two character vectors. > >> bigs <- do.call(paste,TheBigOne) >> ix <-  which(bigs %in% setdiff(bigs,do.call(paste,TheLittleOne))) >> TheBigOne[ix,] > > Ho

Re: [R] looking for setdiff equivalent on dataset

2010-07-29 Thread Hadley Wickham
Here's one way, using a function from the plyr package: TheLittleOne<-data.frame(cbind(c(2,3),c(2,3))) TheBigOne<-data.frame(cbind(c(1,1,2),c(1,1,2))) keys <- plyr:::join.keys(TheBigOne, TheLittleOne) !(keys$x %in% keys$y) TheBigOne[!(keys$x %in% keys$y), ] Hadley On Thu, Jul 29, 2010 at 1:38

[R] [R-pkgs] plyr version 1.1

2010-07-26 Thread Hadley Wickham
plyr is a set of tools for a common set of problems: you need to break down a big data structure into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to: * fit the same model to subsets of a data frame * quickly calculate summary

Re: [R] union data in column

2010-07-24 Thread Hadley Wickham
On Sat, Jul 24, 2010 at 2:23 AM, Jeff Newmiller wrote: > Fahim Md wrote: >> >> Is there any function/way to merge/unite the following data >> >>  GENEID      col1          col2             col3                col4 >>  G234064         1             0                  0                   0 >>  G2340

Re: [R] p-VALUE calculation

2010-07-22 Thread Hadley Wickham
What is your null hypothesis? What is your alternate hypothesis? What is the test statistic? Why do you want a p-value? Hadley On Thu, Jul 22, 2010 at 5:40 PM, jd6688 wrote: > > Here is my dataframe with 1000 rows: > > employee_id         weigth       p-value > > 100                     150 > 1

Re: [R] using "sample()" for a vector of length 1

2010-07-22 Thread Hadley Wickham
Did you look at the examples in sample? # sample()'s surprise -- example x <- 1:10 sample(x[x > 8]) # length 2 sample(x[x > 9]) # oops -- length 10! sample(x[x > 10]) # length 0 ## For R >= 2.11.0 only resample <- function(x, ...) x[sample.int(length(x), ...)] resample(x[x > 8]) #

Re: [R] best way to apply a list of functions to a dataset ?

2010-07-21 Thread Hadley Wickham
> ddply(ma, .(variable), summarise, mean = mean(value), sd = sd(value), >       skewness = skewness(value), median = median(value), >       mean.gt.med = mean.gt.med(value)) In principle, you should be able to do: ddply(ma, .(variable), colwise(each(mean, sd, skewness, median, mean.gt.med))) but

Re: [R] NA preserved in logical call - I don't understand this behavior because NA is not equal to 0

2010-07-18 Thread Hadley Wickham
> The problem is in data.frame[ and any NA in a logical vector will return a > row of NA's. This can be avoid by wrapping which() around the logical vector > which seems entirely wasteful or using subset(). The basic philosophy that causes this behaviour is sensible in my opinion: missing values m

Re: [R] Creating Enumerated Variables

2010-07-16 Thread Hadley Wickham
On Thu, Jul 15, 2010 at 11:08 PM, Dennis Murphy wrote: > Hi: > > I sincerely hope there's an easier way, but one method to get this is as > follows, > with d as the data frame name of your test data: > > d <- d[order(with(d, Age, School, rev(Grade))), ] > d$Count <- do.call(c, mapply(seq, 1, as.ve

Re: [R] a very particular plot

2010-07-16 Thread Hadley Wickham
On Wed, Jul 14, 2010 at 1:32 AM, Ian Bentley wrote: > I've got a couple of more changes that I want to make to my plot, and I > can't figure things out.  Thanks for all the help. > > I'm using this R script > > library(ggplot2) > library(lattice) > # Generate 50 data sets of size 100 and assign th

Re: [R] Recommended way of requiring packages of a certain version?

2010-07-16 Thread Hadley Wickham
> So distributing code to other people is preferably done using R packages, > which gives you this option. However (as far as I am aware), note that this option is checked at package build time, not at load time. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statist

Re: [R] qplot in ggplot2 not working any longer - (what did I do?)

2010-07-15 Thread Hadley Wickham
For a quick fix, you probably need to reinstall plyr. Hadley On Wed, Jul 14, 2010 at 11:03 PM, stephen sefick wrote: > This is the first time that I have tried to update packages with a > tinkered around with .Rprofile.  I start R with R --vanilla and it > does not load my .Rprofile, but when I i

Re: [R] [R-pkgs] New package "list" for analyzing list surveyexperiments

2010-07-15 Thread Hadley Wickham
>> For some reason package writers seem to prefer maximally uninformative >> names for their packages.  To take some examples of recently announced >> packages, can anyone guess what packages 'FDTH', 'rtv', or 'lavaan' >> do?  Why the aversion to informative names along the lines of >> 'Freq_dist_a

Re: [R] How to define a function (with '<-') that has two arguments?

2010-07-14 Thread Hadley Wickham
On Wed, Jul 14, 2010 at 7:39 AM, thmsfuller...@gmail.com wrote: > Hi All, > > The last line if the following code returns the error right below this > paragraph. Essentially, I use the operator %:% to retrieve a variable > in a nested frame. Then I want to use the same operator (with '<-') to > ch

Re: [R] Fast string comparison

2010-07-12 Thread Hadley Wickham
strings <- replicate(1e5, paste(sample(letters, 100, rep = T), collapse = "")) system.time(strings[-1] == strings[-1e5]) # user system elapsed # 0.016 0.000 0.017 So it takes ~1/100 of a second to do ~100,000 string comparisons. You need to provide a reproducible example that illustrates

Re: [R] a very particular plot

2010-07-11 Thread Hadley Wickham
Hi Ian, Have a look at the examples in http://had.co.nz/ggplot2/geom_tile.html for some ideas on how to do this with ggplot2. Hadley On Sat, Jul 10, 2010 at 8:10 PM, Ian Bentley wrote: > Hi all, > > Thanks for the really great help I've received on this board in the past. > > I have a very part

Re: [R] Fast string comparison

2010-07-11 Thread Hadley Wickham
== ? Hadley On Sun, Jul 11, 2010 at 2:08 PM, Ralf B wrote: > What is the fastest way to compare two strings in R? > > Ralf > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http

[R] [R-pkgs] ggplot2 version 0.8.8

2010-07-07 Thread Hadley Wickham
ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and avoid bad parts. It takes care of many of the fiddly details that make plotting a hassle (l

Re: [R] Profiler for R ? (HFWUtils package)

2010-07-06 Thread Hadley Wickham
And the profr package for an alternative display. Hadley On Tuesday, July 6, 2010, Uwe Ligges wrote: > or just see > > ?Rprof > > and > > ?Rprofmem > > > Uwe Ligges > > > On 06.07.2010 01:21, Jim Callahan wrote: > > Message: 21 > Date: Mon, 5 Jul 2010 02:26:29 -0400 > From: Ralf B > To: "r-help@r

Re: [R] Patch for legend.position={left,top,bottom} in ggplot2

2010-07-05 Thread Hadley Wickham
Or wait a couple of days for the next release of ggplot2... Hadley On Mon, Jul 5, 2010 at 11:28 AM, Sebastian Wurster wrote: > Thank you for this nice patch! > To incorporate it you have to open the ggplot2 file in "path to your R > packages\ggplot2\R", search for the first line of code and repl

Re: [R] Separating out data values

2010-07-04 Thread Hadley Wickham
Hi Mark, Try this to get you started: table(roe1 > median(roe1), roe0 > median(roe0)) Hadley On Sun, Jul 4, 2010 at 6:29 AM, Mark Carter wrote: > I'm not very good at statistics, but I know enough to be dangerous. I'm > completely new to R, having just discovered it yesterday. Now that the >

Re: [R] ggplot qplot bar removing bars when truncating scale

2010-07-03 Thread Hadley Wickham
This is possible in ggplot2, but it's an not appropriate use of a bar chart - because length is used to convey value, chopping the bottoms of the bars of will give a misleading impression of the data. Instead, use a dot plot: data$Q <- unlist(lapply(data$Q, function(x) paste(strwrap(x, 20), collap

[R] Non-exported data sets?

2010-07-03 Thread Hadley Wickham
> Sure.  The code uses objects() to find the exported objects in the > package, so I guess the offending object will be there.  You can check > for yourself by loading the package and calling objects() on the package > environment. So I guess my question then is how do data sets and namespaces int

Re: [R] Some questions about R's modelling algebra

2010-07-02 Thread Hadley Wickham
> ?formula in R 2.9.2 says in para 2: > "The %in% operator indicates that the terms on its left are nested > within those on the right. For example a + b %in% a expands to the > formula a + a:b. " Ooops, missed that. So b %in% a = a:b, and that's what's meant by "different coding". Hadley -- A

[R] Some questions about R's modelling algebra

2010-07-02 Thread Hadley Wickham
Hi all, In preparation for teaching a class next week, I've been reviewing R's standard modelling algebra. I've used it for a long time and have a pretty good intuitive feel for how it works, but would like to understand more of the technical details. The best (online) reference I've found so far

Re: [R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-07-01 Thread Hadley Wickham
Here's another version that's a bit easier to read: na.roughfix2 <- function (object, ...) { res <- lapply(object, roughfix) structure(res, class = "data.frame", row.names = seq_len(nrow(object))) } roughfix <- function(x) { missing <- is.na(x) if (!any(missing)) return(x) if (is.numer

Re: [R] transposing a data frame from horizontal to vertical (stacking)

2010-06-29 Thread Hadley Wickham
On Tue, Jun 29, 2010 at 12:22 PM, Dimitri Liakhovitski wrote: > Hello, everyone! > I have a very simple task - I have a data frame (see MyData below) and > I need to stack the data (see result below). > I wrote the syntax below - it's very basic and it does what I need. > But I am sure what I am t

Re: [R] Performance enhancement for ave

2010-06-29 Thread Hadley Wickham
On Tue, Jun 29, 2010 at 8:02 AM, Matthew Dowle wrote: > >> dt = data.table(d,key="grp1,grp2") >> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)]) >   user  system elapsed >   3.89    0.00    3.91        # your 7.064 is 12.23 for me though, so this > 3.9 should be faster for y

[R] Performance enhancement for ave

2010-06-28 Thread Hadley Wickham
library(plyr) n<-10 grp1<-sample(1:750, n, replace=T) grp2<-sample(1:750, n, replace=T) d<-data.frame(x=rnorm(n), y=rnorm(n), grp1=grp1, grp2=grp2) system.time({ d$avx1 <- ave(d$x, list(d$grp1, d$grp2)) d$avy1 <- ave(d$y, list(d$grp1, d$grp2)) }) # user system elapsed # 39.300 0.279

Re: [R] Basic question - more efficient method than loop?

2010-06-28 Thread Hadley Wickham
1) Create a table with two columns: payor and payor.group. 2) Merge that table with your original data Hadley On Mon, Jun 28, 2010 at 10:46 AM, GL wrote: > > I'm guessing there's a more efficient way to do the following using the index > features of R. Appreciate any thoughts > > for (i in

Re: [R] Stacked Histogram, multiple lines for dates of news stories?

2010-06-28 Thread Hadley Wickham
Hi Simon, Here are two ways to do that with ggplot: qplot(test2, data = test_df, geom = "freqpoly", colour = test, binwidth = 30, drop = F) qplot(test2, data = test_df, geom = "bar", fill = test, binwidth = 30) binwidth is in days. If you want to bin by other intervals (like months), I'd recomm

Re: [R] ggplot2: deterministic position_jitter & geom_line with position_jitter

2010-06-25 Thread Hadley Wickham
> I'm having the same problem as Stephan (see below), but what I'm trying to > jitter is not a numeric vector, but a factor. How do I proceed? (Naively > jittering a factor makes it numeric, no longer factor, so I don't get the > custom ordering which conveniently comes with using a factor. I'm not

Re: [R] About normality tests...

2010-06-23 Thread Hadley Wickham
> Finally, FWIW, 1 is not considered "very large" these days; maybe > 10,000,000,000 might be... It's off topic, but I rather like Mike Driscoll's definition of big data: it's too big to fit on a single machine and must be stored on many (http://www.slideshare.net/dataspora/s-4455027). A smal

Re: [R] xtable for latex: setting some values globally

2010-06-23 Thread Hadley Wickham
> If anybody has quick fix, that would be helpful. Write your own function that wraps xtable... Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing

Re: [R] Time in ggplot2

2010-06-22 Thread Hadley Wickham
y > similar SQL set. The select clause carries more coloumns in the failing data > set. > > ottar > > On 20 June 2010 18:28, Hadley Wickham wrote: >> >> Hi Ottar, >> >> It's impossible to tell what the problem is without a reproducible >> example (http:/

Re: [R] Time in ggplot2

2010-06-20 Thread Hadley Wickham
six months, particularly with values that > have no  time portion. You have promised a fix before, but l haven't seen it, > so I convert to Date to work around the bug. > > "Hadley Wickham" wrote: > >>Hi Ottar, >> >>It's impossible to tell what th

Re: [R] Time in ggplot2

2010-06-20 Thread Hadley Wickham
Hi Ottar, It's impossible to tell what the problem is without a reproducible example (http://gist.github.com/270442) Hadley On Sun, Jun 20, 2010 at 4:38 PM, Ottar Kvindesland wrote: > I have a problem that puzzles me a bit today. When loading off data from a > database and plotting using ggplot

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-20 Thread Hadley Wickham
> I've given thought in the past to the question of estimating the R > user base, and came to the conclusion that it is impossible to get > an estimate of the number of users that one could trust (or even > put anything like a margin of error to). I find it hard to believe that it should be harder

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-20 Thread Hadley Wickham
> I agree with all your points. What I have so far is nowhere near the big > picture, but it's a start. When you install some software it asks if you > mind it reporting usage stats back to its home site. I know that sort of > thing has been discussed before on R-help. I'd love to see that added so

Re: [R] Graphics question: How to create a changing "smudge factor" for overlapping lines?

2010-06-15 Thread Hadley Wickham
> The glitches are the cases where you would have a bundle of lines belonging > to a specific cluster, but had spaces between them (because the place of one > of the lines was saved for another line that in the meantime moved to > another cluster). I think that display looked just fine! > I just

Re: [R] Graphics question: How to create a changing "smudge factor" for overlapping lines?

2010-06-15 Thread Hadley Wickham
> My current solution is to use a constant jitter (based on "seq") on all the > k number of clusters, but that causes glitches in the produced image (run my > code to see). What are the glitches? It looks pretty good to me. (I'm not sure if the colour does anything apart from make it pretty thou

Re: [R] Displaying "homogeneous groups" in aov post-hoc results ?

2010-06-12 Thread Hadley Wickham
Try multcompView Hadley On Sat, Jun 12, 2010 at 8:42 AM, Tal Galili wrote: > Hello dear R-help mailing list, > > A friend of mine teaches a regression and experimental design course and > asked me the following question. > > She is trying to find a way to display the "homogeneous groups" (after

Re: [R] Transforming simulation data which is spread across many files into a barplot

2010-06-11 Thread Hadley Wickham
On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley wrote: > I'm an R newbie, and I'm just trying to use some of it's graphing > capabilities, but I'm a bit stuck - basically in massaging the already > available data into a format R likes. > > I have a simulation environment which produces logs, which re

Re: [R] Patch for legend.position={left,top,bottom} in ggplot2

2010-06-10 Thread Hadley Wickham
Cool! Thanks Karsten. If you send me a github pull request I'll incorporate it. Hadley On Thursday, June 10, 2010, Karsten Loesing wrote: > Hi everyone, > > here's the same patch as a new branch on GitHub. > >  http://github.com/kloesing/ggplot2/commit/a25e4fbfa4017ed1 > > Best, > --Karsten > >

Re: [R] string handling

2010-06-04 Thread Hadley Wickham
On Thu, Jun 3, 2010 at 4:06 PM, Wu Gong wrote: > > Hope it helps. > > text <- "var1        var2 > 9G/G09    abd89C/T90 > 10A/T9    32C/C > 90G/G      A/A" > > x <- read.table(textConnection(text), header = T) Or with the stringr package: library(stringr) str_match(x$var1, "(.)/(.)") Hadley --

Re: [R] data frame manipulation with zero rows

2010-06-02 Thread Hadley Wickham
Hi Arnaud, I've added this case to the set of test cases in plyr and it will be fixed in the next version. Hadley On Tue, Jun 1, 2010 at 2:33 PM, arnaud Gaboury wrote: > Maybe not the cleanest way, but I create a fake data frame with one row so > ddply() is happy!! >> if (nrow(futures)==0) futu

Re: [R] geom_ribbon removes missing values

2010-05-31 Thread Hadley Wickham
Hi Karsten, There's no easy way to do this because behind the scenes geom_ribbon uses grid.polygon. Hadley On Sun, May 30, 2010 at 7:26 AM, Karsten Loesing wrote: > Hi everyone, > > it looks like geom_ribbon removes missing values and plots a single > ribbon over the whole interval of x values.

Re: [R] Apply: Output matrix orientation

2010-05-27 Thread Hadley Wickham
Use aaply from the plyr package. Hadley On Thu, May 27, 2010 at 6:24 AM, Johannes Graumann wrote: > Hi, > > Why is the result of below "apply" call rotated with respect to the input > and how to remedy this? > > Thanks, Joh > > .ZScore <- function(input){ >  #cat(input,"\n") >  z <- (input - mea

Re: [R] help required to melt a data frame

2010-05-25 Thread Hadley Wickham
On Tue, May 25, 2010 at 8:39 AM, Mohan L wrote: > > > On Tue, May 25, 2010 at 6:59 PM, Hadley Wickham wrote: >> >> > I trying to get a new data frame for 1 bedroom using cast. But I am not >> > able >> > to get the below data for 1 Bedroom using cost. &

Re: [R] help required to melt a data frame

2010-05-25 Thread Hadley Wickham
> I trying to get a new data frame for 1 bedroom using cast. But I am not able > to get the below data for 1 Bedroom using cost. > > State  Jan Feb >  xxx   2    0 >  yyy   2    2 >  zzz   1    0 What do those numbers represent? Hadley -- Assistant Professor / Dobelman Family Junior Chair Depar

Re: [R] Function that is giving me a headache- any help appreciated (automatic read )

2010-05-18 Thread Hadley Wickham
> precip.1 <- subset(DF, precipitation!="NA") > b <- ddply(precip.1$precipitation, .(precip.1$gauge_name), cumsum) > DF.precip <- precip.1 > DF.precip$precipitation <- b$.data I suspect what you want here is ddply(precip.1, "gauge_name", transform, precipitation = cumsum(precipitation)) Hadley

Re: [R] plot for linear discriminant

2010-05-16 Thread Hadley Wickham
Hi Giovanni, Have a look at the classifly package for an alternative approach that works for all classification algorithms. If you provided a small reproducible example, I could step through it for you. Hadley On Sat, May 15, 2010 at 6:19 AM, Giovanni Azua wrote: > Hello, > > I have a labelled

Re: [R] Using plyr and segmented together - output problem

2010-05-12 Thread Hadley Wickham
Hi Ken, Could you please provide a small reproducible example? There are some hints on how to do so at http://gist.github.com/270442. Hadley On Wed, May 12, 2010 at 11:22 AM, Ken Minns wrote: > I am using the package segmented to fit a simple breakpoint regression to a > large number of sets

Re: [R] meaning of "<<-"

2010-05-08 Thread hadley wickham
You might want to look at the answers to that question on stackoverflow.com: http://stackoverflow.com/questions/2628621 Hadley On Sat, May 8, 2010 at 1:59 AM, Ruihong Huang wrote: > Hi, > > In my memory, "<<-" means assigning via a pointer or alias. But this is not > officially defined in "R Lan

Re: [R] Any way to apply TWO functions with tapply()?

2010-05-07 Thread hadley wickham
> As pointed out to me offline, data.table should be added to the list > of relevant packages as well.  Its primary advantage is for large data > sets as it is very fast.  Its interface does take some getting used to > but its most recent version on CRAN does have several vignettes which > should e

Re: [R] ggplot2's geom_errorbar legend

2010-05-05 Thread hadley wickham
Hi Giovanni, The basic idea is: classiclimits <- aes(x=x[1:100],ymax = classiccis[1:100,e,p, ymin=classiccis[1:100,e,p,2], colour = "classic") ownlimits <- aes(x=x[1:100]+0.4,ymax = owncis[1:100,e,p,1], ymin=owncis[1:100,e,p,2], colour = "own") rbootlimits <- aes(x=x[1:100]+0.8,ymax = rbootcis

Re: [R] Combining ggplot2 objects and/or extracting layers

2010-04-28 Thread hadley wickham
t with widths equal to the durations > of the recessions. Once I create a free-standing plot, I'd like to be able > to use it in various other contexts, including adding it to other existing > plots. The alternative is to reconstruct the plot as a layer and add it to > the other plots, but t

Re: [R] ggplot2 - help with intervals in geom plot

2010-04-27 Thread hadley wickham
> The problem is that I want HCount and HProbCount to use custom > gradients. i.e. a colour for 0-10, next shade for 10-30, next for 30-70 > etc. Use cut to create factor with those levels, and then scale_fill_manual to match values to colours. > Due to some magic done on the data, one uses inter

Re: [R] How to plot a table of numbers as an image using ggplot2?

2010-04-27 Thread hadley wickham
Yes. Hadley PS. If you provide a small reproducible example you are bound to get more useful answers ;) On Tue, Apr 27, 2010 at 8:45 AM, arnaud chozo wrote: > Hi all, > > I'd want to plot a table of numbers such that the values are represented by > gray level. Is there an easy way to do that u

Re: [R] Intersection for two curves

2010-04-24 Thread hadley wickham
> Many people seem to be reluctant to define functions, > even thought I think it is a pretty small step from > writing scripts to writing functions. I'm not so sure - I find most students struggle to grasp that next level of abstraction. Generalising from a specific task to a general function is

Re: [R] Intersection for two curves

2010-04-24 Thread hadley wickham
On Sat, Apr 24, 2010 at 12:54 PM, Peter Ehlers wrote: > Well, this has seriously gotten off the original topic. > > While Hadley makes some sense, it is nevertheless > sometimes the case (surely so for David, I would surmise) > that one is putting together a response to an R-help > query when a ne

Re: [R] Intersection for two curves

2010-04-24 Thread hadley wickham
> Perhaps, true in some respects. I am still chiseling out work using > primitive editing tools. But it still takes several minutes to load the > objects I am working on into memory and then several minutes each to build > new models. The models still reside in memory, since I do not know any > met

Re: [R] Intersection for two curves

2010-04-24 Thread hadley wickham
>> If clearing out your workspace destroys *any* work, then something is >> seriously wrong with your workflow. >> > > Yes, of course.  Lets all post viruses to run on each others' > machines.  That will teach those users who don't run antivirus and > backup software between each posting to r-help.

Re: [R] Intersection for two curves

2010-04-24 Thread hadley wickham
>> rm(list=ls()) > > PLEASE, DON'T DO THAT. Or rather you can do it in your workspace but > don't post it. It's not fair to a person who may not read your code line by > line before pasting it into their workspace and having it wiped out. Do you > expect us to completely clear out our workspace

Re: [R] ggplot2: how to specify x-axis limits to geom_abline() ?

2010-04-23 Thread hadley wickham
Use geom_segment and calculate the end points yourself. Hadley On Fri, Apr 23, 2010 at 8:38 AM, arnaud chozo wrote: > Hi all, > > I'd want to plot a segment from a line specified by slope and intercept. > I want to plot this line between two limits, x1 and x2, without imposing > these limits to t

Re: [R] ggplot and scale_x_date

2010-04-21 Thread hadley wickham
Hi Liam, Unfortunately this currently isn't supported. It's on my to do list: http://github.com/hadley/ggplot2/issues/issue/94 Hadley On Tue, Apr 20, 2010 at 7:59 PM, Liam Blanckenberg wrote: > Hi all, > > I have a question about setting arbitrary breaks/labels when using GGPLOT > and date/tim

Re: [R] Formatting data, adding column names, use reshape, a newbie question

2010-04-19 Thread hadley wickham
On Mon, Apr 19, 2010 at 5:13 AM, Paul Rigor (ucla) wrote: > Hi all, > I'm an R novice. > > I have data that's already formatted as "molten" that reshape should be able > to work with. For example, the following was read in with > read.csv(filename,sep=" ", header=FALSE) > >      V1               V

<    1   2   3   4   5   6   7   8   9   10   >