Re: [R] Seeking Code for Chaid or Similar

2008-08-27 Thread Daniel Folkinshteyn
on 08/27/2008 09:26 AM David Cobey said the following: I'm interested in modifying a regression tree algorithm to use the difference between a control and a dosed sample as its dependent variable. I was wondering if anyone knew where I could find code to implement a basic chaid algorithm. Is

Re: [R] filtering out data

2008-08-22 Thread Daniel Folkinshteyn
If I understand correctly the result you are trying to achieve, I think what you may be looking for is this: model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset[merged.dataset$model1 == 1, ]) This will give you a regression using only the data for which model1 == 1. on 08/22/2008

Re: [R] How to read malformed csv files with read.table?

2008-08-22 Thread Daniel Folkinshteyn
on 08/22/2008 10:19 AM Martin Ballaschk said the following: how do I read files that have two header fields less than they have columns? The easiest solution would be to insert one or two additional header fields, but I have a lot of files and that would be quite a lot of awful work. Any idea

[R] plm problem: Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

2008-08-18 Thread Daniel Folkinshteyn
Hi, I am having a problem with a fixed-effects regression using the "plm" package. A regression of this form runs just fine: plm(y ~ x*z, data=datasubset, model="within") but when i try to add a lag in there, and run it on the same dataset, like this: plm(y ~ x*z + lag(x,-1)*z, data=datasu

Re: [R] Quantitative analysis of non-standard scatter plots.

2008-07-23 Thread Daniel Folkinshteyn
I second that - quantile regression seems to be what you want. on 07/23/2008 10:10 AM Ben Bolker said the following: Firas Swidan gmail.com> writes: Hi, I am having difficulties in finding ways to analyse scatter plots and quantitatively differentiate between them. Try quantile regress

Re: [R] Need ideas on how to show spikes in my data and how to code it in R

2008-06-27 Thread Daniel Folkinshteyn
togram but since the audience is not very statistically experienced I would prefer to do it this way. Anyone have an idea? Thanks again for your help. Thomas Fröjd On Wed, Jun 25, 2008 at 6:16 PM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: I don't understand this. Why not

Re: [R] Recoding

2008-06-27 Thread Daniel Folkinshteyn
if there's nothing specific for it, you could probably do it with merge? on 06/27/2008 02:41 PM Agustin Lobo said the following: Hi! Given a vector (or a factor within a df),i.e. v1 <- c(1,1,1,2,3,4,1,10,3) and a dictionary cbind(c(1,2,3),c(1001,1002,1003)) is there a function (on the same lin

Re: [R] R help

2008-06-27 Thread Daniel Folkinshteyn
oh, unlist - very nice function, thanks :) on 06/27/2008 11:23 AM Jorge Ivan Velez said the following: Hi Ramya, Try something like this: as.character(unlist(lapply(geneset,function(x) x[1]))) HTH, Jorge On Fri, Jun 27, 2008 at 10:33 AM, Rajasekaramya <[EMAIL PROTECTED]> wrote: Hi, I h

Re: [R] R help

2008-06-27 Thread Daniel Folkinshteyn
try this: firstgenes = lapply(geneset, function(x){return(x[1,1])}) firstgenes = do.call(rbind(firstgenes)) on 06/27/2008 10:33 AM Rajasekaramya said the following: Hi, I have a problem in assessing the list element. i have list called geneset it contains the following elements

Re: [R] matching problem

2008-06-27 Thread Daniel Folkinshteyn
this should do what you want: > myexstrings = c("*AAA.AA","BBB BB","*.CCC.","**dd- d") > a = gsub("^\\W*","", myexstrings,perl=T) > b = gsub("\\W.*", "", a, perl=T) > b [1] "AAA" "BBB" "CCC" "dd" first one, removes any non-word characters from the beginning (as you already figured out) second o

Re: [R] Problems exporting graphs

2008-06-26 Thread Daniel Folkinshteyn
not sure why it doesn't work, but try the following: first, plot to a regular window, then run: > dev.copy(device=png, file="yourfilename.png") > dev.off() see if that produces a file you want. another note: what do you mean you can't just "copy and paste the graph" in ubuntu? doesn't pressing

Re: [R] Data matrix of all possible response patterns

2008-06-26 Thread Daniel Folkinshteyn
this is probably a cludge, and there may be a "neater" way to do this, but... here's one: > a = 0:1 > for (i in 1:9){ a= merge(unname(a), 0:1) } > a = t(a) after the for loop, 'a' will contain a 1024 row by 10 col dataframe. putting it through a transpose, gives you the 10 rows by 1024 cols ma

Re: [R] create new column with colnames resulting from paste()

2008-06-26 Thread Daniel Folkinshteyn
no need for a for loop - we can vectorize this: > dt <- data.frame(a = c(1, 2, 3), b = c(3, 2, 2), c = c(1, 3, 5)) > dt a b c 1 1 3 1 2 2 2 3 3 3 2 5 > dt[,paste("test", 1:2, sep="")] = rep(1:2, each=3) > dt a b c test1 test2 1 1 3 1 1 2 2 2 2 3 1 2 3 3 2 5 1 2 on 06

Re: [R] running R-code outside of R

2008-06-25 Thread Daniel Folkinshteyn
If I analyze a client's data using an R script I created then I can charge the client a $20,000 consulting fee, but, if I let the client push the button to execute the R script and charge him 10 cents for the privilege then I can be sued for violating the GPL? Or are my I think you cannot be su

Re: [R] Need ideas on how to show spikes in my data and how to code it in R

2008-06-25 Thread Daniel Folkinshteyn
I don't understand this. Why not just get hist() to plot on the density scale, thereby making its output commensurate with the output of density()? The hist() function will plot on the density scale if you ask it to. Set freq=FALSE (or prob=TRUE) in the call to hist. ehrm

Re: [R] insert new columns to a matrix

2008-06-24 Thread Daniel Folkinshteyn
just cbind the cols in the appropriate order: m.2 = cbind( m.1[,1:5], yourthreecolumns, m.1[,6:ncol(m.1)] ) on 06/24/2008 07:02 AM Daren Tan said the following: Instead of prepend or append new columns to a matrix, how to insert them to a matrix ? For example, I would like to insert 3 new column

Re: [R] Need ideas on how to show spikes in my data and how to code it in R

2008-06-23 Thread Daniel Folkinshteyn
on 06/23/2008 03:40 PM Thomas Frööjd said the following: 1. Shift the mean and std on the reference dataset to the mean and std of my clinic birth weight data. to shift the mean by any distance, just add or subtract that distance from each observation (e.g., to move mean from m1 to m2, t

Re: [R] How can I execute a .R/script file

2008-06-19 Thread Daniel Folkinshteyn
source('yourscript.R') on 06/19/2008 03:11 PM [EMAIL PROTECTED] said the following: Dear R-Users, I've written a number of functions in a .R/script file. I would like to call those functions from another script file. How can I execute all the code in a script file so that the functions are av

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
install.packages("profr") library(profr) p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p) That should at least help you see where the slow bits are. Hadley so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are the biggest timesuckers... i suppose

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
on 06/06/2008 06:55 PM hadley wickham said the following: Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages("profr") library(profr) p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p)

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
t those columns, convert them to a matrix, do all the matching, and then based on some sort of row index retrieve all of the associated columns. -Don At 2:09 PM -0400 6/5/08, Daniel Folkinshteyn wrote: Hi everyone! I have a question about data processing efficiency. My data are as follows: I

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
e proper index). Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Daniel Folkinshteyn wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the follow

Re: [R] R + Linux

2008-06-06 Thread Daniel Folkinshteyn
another vote for ubuntu here - works for me, and quite trouble-free. add the r-project repositories, and you're sure to always have the latest, too. (if you don't care for the latest R, you can of course also just get R from the distro's repos as well) on 06/06/2008 05:22 PM Abhijit Dasgupta s

Re: [R] editing a data.frame

2008-06-06 Thread Daniel Folkinshteyn
works for me: > sub('1.00', '1', '1.00E-20') [1] "1E-20" remember, according to ?sub, it's sub(pattern, repl, string) try doing it step by step. first, see what yr1bp$TreeTag[1501] is. then, if it's the right data item, see what the output of sub("1.00", "1", yr1bp$TreeTag[1501]) is. that'll l

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
d not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat Daniel Folkinshteyn wrote: thanks for the tip! i'll try that and see how big of a difference that makes... if i am not sure what exactl

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
== end function code=== on 06/06/2008 01:35 PM Gabor Grothendieck said the following: I think the posting guide may not be clear enough and have suggested that it be clarified. Hopefully this better communicates what is required and why in a shorter amount of space: https://stat.ethz.c

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
just in case, uploaded it to the server, you can get the zip file i mentioned here: http://astro.temple.edu/~dfolkins/helplistfiles.zip on 06/06/2008 01:25 PM Daniel Folkinshteyn said the following: i thought since the function code (which i provided in full) was pretty short, it would be

Re: [R] reorder breaking by half

2008-06-06 Thread Daniel Folkinshteyn
ci = rainbow(7)[c(4:7, 1:3)] on 06/06/2008 01:02 PM avilella said the following: Hi, I want to reorder the colors given by rainbow(7) so that the last half move to the first 4. For example: ci=rainbow(7) ci [1] "#FFFF" "#FFDB00FF" "#49FF00FF" "#00FF92FF" "#0092" "#4900" [7] "#FF

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
[EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Daniel Folkinshteyn wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a que

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
008 at 12:03 PM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: Anybod

Re: [R] Store filename

2008-06-06 Thread Daniel Folkinshteyn
well, where are you getting the filename in the first place? are you looping over a list of filenames that comes from somewhere? generally, for concatenating strings, look at function 'paste': write.table(myoutput, paste(myfilename,"_out.txt", sep=''),sep="\t") on 06/06/2008 11:51 AM DAVID ARTE

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had

Re: [R] Merging two dataframes

2008-06-06 Thread Daniel Folkinshteyn
than by.y and by.x? I think when i was playing around i tried the all. command in that setup as well Mike On Fri, Jun 6, 2008 at 2:07 PM, Daniel Folkinshteyn <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: try this: FullData <- merge(ETC, SURVEY, by.x = &

Re: [R] request: a class having max frequency

2008-06-06 Thread Daniel Folkinshteyn
names(f)[which.max(f)] on 06/06/2008 09:14 AM Muhammad Azam said the following: Dear R users I have a very basic question. I tried but could not find the required result. using dat <- pima f <- table(dat[,9]) f 0 1 500 268 i want to find that class say "0" having maximum frequency i.e

Re: [R] Merging two dataframes

2008-06-06 Thread Daniel Folkinshteyn
try this: FullData <- merge(ETC, SURVEY, by.x = "ord", by.y = "uid", all.x = T, all.y = F) on 06/06/2008 07:30 AM Michael Pearmain said the following: Hi All, Newbie question for you all but i have been looking at the archieves and the help dtuff to get a rough idea of what i want to do I wo

Re: [R] simple data question

2008-06-06 Thread Daniel Folkinshteyn
should work - don't even have to put them in quotes, if your field separator is not space. why don't you just try it and see what comes out? :) on 06/06/2008 08:43 AM stephen sefick said the following: if I wanted to use a name for a column with two words say Dick Cheney and George Bush can I p

Re: [R] Multiple comment.char under read.table

2008-06-06 Thread Daniel Folkinshteyn
according to the helpfile, comment only takes one character, so you'll have to do some 'magic' :) i'd suggest to first run mydata through sed, and replace one of the comment chars with another, then run read.table with the one comment char that remains. sed -e 's/^\^/!/' mydata.txt > mydata2

Re: [R] problems with reading the data and merge

2008-06-05 Thread Daniel Folkinshteyn
the '00' entries may be in a numeric column, so it gets typecast to a number, and of course 00 == 0, numerically speaking, so they get 'condensed'. to be sure you read everything "as is", specify "colClasses='character'. : data<-read.table("data.txt",sep='\t', header=T, colClasses='character')

Re: [R] write.table() error

2008-06-05 Thread Daniel Folkinshteyn
looks like you don't have permission to write a file to C:\ try writing to some other directory where you have write access (e.g., your user's home dir, or your "my documents", or something like that). on 06/05/2008 11:57 PM Megh Dal said the following: Hi, I got following error in write.tab

Re: [R] [Possible SPAM] Reading selected lines in an .html file

2008-06-05 Thread Daniel Folkinshteyn
i know this is an R mailing list :) but... i'll recommend you try python with the beautifulsoup module - makes html processing a cinch. another thing to note is that wunderground provides very handy RSS feeds for every location, so rather than parsing the html page (with it's associated bundle

Re: [R] Improving data processing efficiency

2008-06-05 Thread Daniel Folkinshteyn
oosen said the following: Maybe you should provide a minimal, working code with data, so that we all can give it a try. In the mean time: take a look at the Rprof function to see where your code can be improved. Good luck Bart Daniel Folkinshteyn-2 wrote: Hi everyone! I have a question

Re: [R] how to get the distribution curve from a data set?

2008-06-05 Thread Daniel Folkinshteyn
would a density plot do? try plot(density(x)) if you are specifically after the histogram tops rather than a density estimate, then get the hist object with plot=F, then look at the counts attribute: histobj = hist(x, freq="TRUE", breaks=1000, plot=F) plot(histobj$counts) hope this helps. o

[R] Improving data processing efficiency

2008-06-05 Thread Daniel Folkinshteyn
Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For