Using the dataset dat (found below), I'm seeking a way to condense down
the data.frame such that each site (i.e., CID_1...CID_13) has a
maximum of 7 rows of post-processed data, where the first 6 have the
highest countPercentage and the 7th row is the sum of countPercentage
from all other rows
datBySite - split(dat, dat$site)
output - lapply(datBySite, function(x){
+ x$idx - seq_len(nrow(x))
+ x$grp - ifelse(x$idx 7, x$idx, 7)
+ rval - tapply(x$countPercentage, x$grp, sum)
+ x$grp - x$count - x$countTotal - NULL
+ x - x[seq_len(7), ]
+ x$tax_name - as.character(x$tax_name)
+
Here's a solution using the plyr library:
library(plyr)
dat - read.table(header=TRUE, sep=,, as.is=TRUE, ## as.is=TRUE
text=site,tax_name,count,countTotal,countPercentage
CID_1,Cyanobacteria,46295,123509,37.483098398
CID_1,Proteobacteria,36120,123509,29.244832360
On Dec 6, 2014, at 6:37 PM, Edoardo Prestianni wrote:
Excuse the inaccuracy, the warning is value label missing. the same
variable is considered as factor (w/ values ranging from a to b) in one
dataset, as int in another. I want it to be a factor in both.
So, you are importing two
Team,
I am giving the exact code that produces the error. Please see below. Can
anyone please help ?
Thanks
Aravindhan
PROGRAM
---
rm(list = ls())
x-c(1,14,49,26,4,10,25,36,79,15)
Hi Kristi,
One year later I've been with the same question and found a solution with
the help (see plot.gbm: Marginal plots of fitted gbm objects.)
If your GBM-model is gbm1 - gbm(y ~ x1+x2, .) one can get the
coefficients for each x with:
print(plot(gbm1, i.var=1, n.trees=1000,
Dear Rena,
As Peter points out, it is better to ask the maintainer of the program for
detailed questions.
As Peter correctly surmised, print.psych (which is used to print the output
from the fa function), knows that you have an oblique solution and is reporting
the amount of variance
Dear R users
I am puzzled by the following result from R script. I am trying to convert
local time to UTC time. Time zone is -5, therefore I used the following
approach.
Below is the script.
Corrected_SA_data$date_time[k-1]
[1] 2007-03-11 01:00:00
Corrected_SA_data$TZ[k-1]
[1] -5
dplyr version (good for large datasets):
library(dplyr)
# if original example dat data.frame is used
# using read.csv with as.is=TRUE or stringsAsFactors=FALSE is better
dat$tax_name - as.character( dat$tax_name )
# dplyr pipe chain
( dat
%% arrange( site, desc( countPercentage ))
%%
I would use the 'lubridate' package for this:
z - Sys.time()
z
[1] 2014-12-07 15:43:50 EST
require(lubridate)
with_tz(z, UTC)
[1] 2014-12-07 20:43:50 UTC
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do,
You have not provided a reproducible example, so anything I say could be
wrong just due to misinterpretation. Please read [1] for suggestions on
making your examples reproducible, particularly regarding the use of dput
to provide example data. You have also posted in HTML format,
which can
Dear all!
Is there a R package to compute neighborhood competition index (Shutz,
Hegyi, and many index).
Thank you very much!
--
---
Catalin-Constantin ROIBU
Lecturer PhD, Forestry engineer
Forestry Faculty of Suceava
Str. Universitatii no. 13, Suceava, 720229, Romania
office phone +4 0230
Excuse the inaccuracy, the warning is value label missing. the same
variable is considered as factor (w/ values ranging from a to b) in one
dataset, as int in another. I want it to be a factor in both.
So, you are importing two different Stata formatted files an in only one
of them is the
Looking over Jeff's dplyr solution, I see that I forgot this part of the
original spec:
where the first 6 have the highest countPercentage
So here's a corrected sum_cid() function for my plyr solution:
summ_cid = function(frm) {
# sort by countPercentage
sorted_frm = arrange(frm,
catalin roibu catalinroibu at gmail.com writes:
Dear all!
Is there a R package to compute neighborhood competition index (Shutz,
Hegyi, and many index).
Thank you very much!
library(sos)
findFn(Hegyi)
leads to
http://finzi.psych.upenn.edu/R/library/siplab/html/pairwise.html
... and even more directly, googling on R package Hegyi brought up
the siplab package as the first hit. As it deals with forestry and
vegetation, it would appear to be what the OP wanted -- and could have
found him/her self with a minimum of effort.
Sigh... I don't understand why people don't
Thank you very much Jeff. Below is the data I used:
Corrected_data
SA_LST SA_GHI_mean
61759 3/11/2007 1:00 0.0
67517 3/11/2007 2:00 0.0
70017 3/11/2007 3:00 0.0
70524 3/11/2007 4:00 0.0
71061 3/11/2007 5:00 0.0
71638 3/11/2007 6:00 0.0
Hi guys,
I am trying to familiar myself with bootstrapping using rms package and
stuck on how to plot the bootstrap distribution. Would appreciate if
somebody could help.
Thanks in advance.
From
Dear All
I have tried very hard to work out what to do with putting logged data into
metafor; the paper says..
'geometric mean antibody concentrations (GMCs) or opsonophagocytic activity
titres (geometric mean titres [GMT]) were calculated with 95% CIs by taking the
antilog of the mean of the
While you may get a helpful reply, I think this is really not the
forum for such relatively basic math/stat questions. As you seem to be
more or less at sea here, I really really suggest that you seek help
from a local statistical resource.
Cheers,
Bert
Bert Gunter
Genentech Nonclinical
20 matches
Mail list logo