[R] Condensing data.frame

2014-12-07 Thread Morway, Eric
Using the dataset dat (found below), I'm seeking a way to condense down the data.frame such that each site (i.e., CID_1...CID_13) has a maximum of 7 rows of post-processed data, where the first 6 have the highest countPercentage and the 7th row is the sum of countPercentage from all other rows

Re: [R] Condensing data.frame

2014-12-07 Thread Chel Hee Lee
datBySite - split(dat, dat$site) output - lapply(datBySite, function(x){ + x$idx - seq_len(nrow(x)) + x$grp - ifelse(x$idx 7, x$idx, 7) + rval - tapply(x$countPercentage, x$grp, sum) + x$grp - x$count - x$countTotal - NULL + x - x[seq_len(7), ] + x$tax_name - as.character(x$tax_name) +

Re: [R] Condensing data.frame

2014-12-07 Thread John Posner
Here's a solution using the plyr library: library(plyr) dat - read.table(header=TRUE, sep=,, as.is=TRUE, ## as.is=TRUE text=site,tax_name,count,countTotal,countPercentage CID_1,Cyanobacteria,46295,123509,37.483098398 CID_1,Proteobacteria,36120,123509,29.244832360

Re: [R] bad STATA dataset import, how to change value labels

2014-12-07 Thread David Winsemius
On Dec 6, 2014, at 6:37 PM, Edoardo Prestianni wrote: Excuse the inaccuracy, the warning is value label missing. the same variable is considered as factor (w/ values ranging from a to b) in one dataset, as int in another. I want it to be a factor in both. So, you are importing two

Re: [R] boot strapping poisson getting warnings and negative values

2014-12-07 Thread Aravindhan, K
Team, I am giving the exact code that produces the error. Please see below. Can anyone please help ? Thanks Aravindhan PROGRAM --- rm(list = ls()) x-c(1,14,49,26,4,10,25,36,79,15)

[R] Which is the final model for a Boosted Regression Trees (GBM)?

2014-12-07 Thread Samuel Reuther
Hi Kristi, One year later I've been with the same question and found a solution with the help (see plot.gbm: Marginal plots of fitted gbm objects.) If your GBM-model is gbm1 - gbm(y ~ x1+x2, .) one can get the coefficients for each x with: print(plot(gbm1, i.var=1, n.trees=1000,

Re: [R] Difference in cummulative variance depending on print command

2014-12-07 Thread William Revelle
Dear Rena, As Peter points out, it is better to ask the maintainer of the program for detailed questions. As Peter correctly surmised, print.psych (which is used to print the output from the fa function), knows that you have an oblique solution and is reporting the amount of variance

[R] date time problem

2014-12-07 Thread Alemu Tadesse
Dear R users I am puzzled by the following result from R script. I am trying to convert local time to UTC time. Time zone is -5, therefore I used the following approach. Below is the script. Corrected_SA_data$date_time[k-1] [1] 2007-03-11 01:00:00 Corrected_SA_data$TZ[k-1] [1] -5

Re: [R] Condensing data.frame

2014-12-07 Thread Jeff Newmiller
dplyr version (good for large datasets): library(dplyr) # if original example dat data.frame is used # using read.csv with as.is=TRUE or stringsAsFactors=FALSE is better dat$tax_name - as.character( dat$tax_name ) # dplyr pipe chain ( dat %% arrange( site, desc( countPercentage )) %%

Re: [R] date time problem

2014-12-07 Thread jim holtman
I would use the 'lubridate' package for this: z - Sys.time() z [1] 2014-12-07 15:43:50 EST require(lubridate) with_tz(z, UTC) [1] 2014-12-07 20:43:50 UTC Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do,

Re: [R] date time problem

2014-12-07 Thread Jeff Newmiller
You have not provided a reproducible example, so anything I say could be wrong just due to misinterpretation. Please read [1] for suggestions on making your examples reproducible, particularly regarding the use of dput to provide example data. You have also posted in HTML format, which can

[R] neighborhood competition index in r package

2014-12-07 Thread catalin roibu
Dear all! Is there a R package to compute neighborhood competition index (Shutz, Hegyi, and many index). Thank you very much! -- --- Catalin-Constantin ROIBU Lecturer PhD, Forestry engineer Forestry Faculty of Suceava Str. Universitatii no. 13, Suceava, 720229, Romania office phone +4 0230

Re: [R] bad STATA dataset import, how to change value labels

2014-12-07 Thread Edoardo Prestianni
Excuse the inaccuracy, the warning is value label missing. the same variable is considered as factor (w/ values ranging from a to b) in one dataset, as int in another. I want it to be a factor in both. So, you are importing two different Stata formatted files an in only one of them is the

Re: [R] Condensing data.frame

2014-12-07 Thread John Posner
Looking over Jeff's dplyr solution, I see that I forgot this part of the original spec: where the first 6 have the highest countPercentage So here's a corrected sum_cid() function for my plyr solution: summ_cid = function(frm) { # sort by countPercentage sorted_frm = arrange(frm,

Re: [R] neighborhood competition index in r package

2014-12-07 Thread Ben Bolker
catalin roibu catalinroibu at gmail.com writes: Dear all! Is there a R package to compute neighborhood competition index (Shutz, Hegyi, and many index). Thank you very much! library(sos) findFn(Hegyi) leads to http://finzi.psych.upenn.edu/R/library/siplab/html/pairwise.html

Re: [R] neighborhood competition index in r package

2014-12-07 Thread Bert Gunter
... and even more directly, googling on R package Hegyi brought up the siplab package as the first hit. As it deals with forestry and vegetation, it would appear to be what the OP wanted -- and could have found him/her self with a minimum of effort. Sigh... I don't understand why people don't

Re: [R] date time problem

2014-12-07 Thread Alemu Tadesse
Thank you very much Jeff. Below is the data I used: Corrected_data SA_LST SA_GHI_mean 61759 3/11/2007 1:00 0.0 67517 3/11/2007 2:00 0.0 70017 3/11/2007 3:00 0.0 70524 3/11/2007 4:00 0.0 71061 3/11/2007 5:00 0.0 71638 3/11/2007 6:00 0.0

[R] How to plot bootstrap distribution in rms package?

2014-12-07 Thread Eddie Smith
Hi guys, I am trying to familiar myself with bootstrapping using rms package and stuck on how to plot the bootstrap distribution. Would appreciate if somebody could help. Thanks in advance. From

Re: [R] metafor - code for analysing geometric means

2014-12-07 Thread Purssell, Ed
Dear All I have tried very hard to work out what to do with putting logged data into metafor; the paper says.. 'geometric mean antibody concentrations (GMCs) or opsonophagocytic activity titres (geometric mean titres [GMT]) were calculated with 95% CIs by taking the antilog of the mean of the

Re: [R] metafor - code for analysing geometric means

2014-12-07 Thread Bert Gunter
While you may get a helpful reply, I think this is really not the forum for such relatively basic math/stat questions. As you seem to be more or less at sea here, I really really suggest that you seek help from a local statistical resource. Cheers, Bert Bert Gunter Genentech Nonclinical