I am merging two data frames:
tuneAcc <- structure(list(select = c(FALSE, TRUE), method =
structure(c(1L, 1L), .Label = "GCV.Cp", class = "factor"), RMSE =
c(29.2102056093962, 28.9743318817886), Rsquared =
c(0.0322612161559773, 0.0281713457306074), RMSESD = c(0.981573768028697,
0.791307778398384),
The problem is not with `caret. Your output says:
> installation of package ‘minqa’ had non-zero exit status
`caret` has a dependency that has a dependency on `minqa`. The same is true
for `RcppEigen` and the others.
What code did you use to do the install? What OS and version or R etc?
On Th
anks and
>
>
> Kind Regards
>
>
>
> --
> Muhammad Bilal
> Research Fellow and Doctoral Researcher,
> Bristol Enterprise, Research, and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
>
> *muh
It is extremely difficult to tell what the issue might be without a
reproducible example.
The only thing that I can suggest is to use the non-formula interface to
`train` so that you can avoid creating dummy variables.
On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal <
muhammad2.bi...@live.uwe.ac.
There is a function called `smda` in the sparseLDA package that implements
the model described in Clemmensen, L., Hastie, T., Witten, D. and Ersbøll,
B. Sparse discriminant analysis, Technometrics, 53(4): 406-413, 2011
Max
On Sun, Jan 24, 2016 at 10:45 PM, TJUN KIAT TEO
wrote:
> Hi
>
> I notice
Providing a reproducible example and the results of `sessionInfo` will help
get your question answered.
Also, what is the point of using glmnet with RFE? It already does feature
selection.
On Wed, Dec 23, 2015 at 1:48 AM, Manish MAHESHWARI wrote:
> Hi,
>
> I am trying to use caret, for feature
Providing a reproducible example and the results of `sessionInfo` will help
get your question answered.
My only guess is that one or more of your predictors are factors and that
the in-sample data (used to build the model during resampling) have
different levels than the holdout samples.
Max
On
Right now, using `method = "cv"` or `method = "repeatedcv"` does stratified
sampling. Depending on what you mean by "ensure" and the nature of your
outcome (categorical?), it probably already does.
On Mon, Nov 23, 2015 at 7:04 PM, TJUN KIAT TEO wrote:
> In the caret train control function, is it
Providing a reproducible example and the results of `sessionInfo` will help
get your question answered. For example, did you use the formula or
non-formula interface to `train` and so on
On Thu, Nov 5, 2015 at 1:10 PM, Bert Gunter wrote:
> I am not familiar with caret/Cubist, but assuming they
This might help:
http://bit.ly/1MUP0Lj
On Wed, Jul 29, 2015 at 11:00 AM, jpara3
wrote:
> ¿How can i set up a study with random forest where the response is highly
> imbalanced?
>
>
>
> -
>
> Guided Tours Basque Country
>
> Guided tours in the three capitals of the Basque Country: Bilbao,
>
On Tue, Jul 7, 2015 at 8:19 AM, John Fox wrote:
> Dear Peter,
>
> You're correct that these examples aren't verb phrases (though the second
> one contains a verb phrase). I don't want to make the discussion even more
> pedantic (moving it in this direction was my fault), but "Paragraph" isn't
> q
The version of caret just put on CRAN has a function called mnLogLoss that
does this.
Max
On Mon, May 11, 2015 at 11:17 AM, Lorenzo Isella
wrote:
> Dear All,
> I am trying to implement my own metric (a log loss metric) for a
> binary classification problem in Caret.
> I must be making some mist
gt;
> > -Original Message-
> > From: wyl...@ischool.utexas.edu
> > Sent: Fri, 03 Apr 2015 16:07:57 -0500
> > To: r-help@r-project.org
> > Subject: [R] Repeated failures to install "caret" package (of Max Kuhn)
> >
> > For an edx course,
You can create your own:
http://topepo.github.io/caret/custom_models.html
I put a prototype together. Source this file:
https://github.com/topepo/caret/blob/master/models/files/chaid.R
then try this:
library("CHAID")
### fit tree to subsample
set.seed(290875)
USvoteS <- USvote[sample(1:
What you are asking is a bad idea on multiple levels. You will grossly
over-estimate the area under the ROC curve. Consider the 1-NN model: you
will have perfect predictions every time.
To do this, you will need to run train again and modify the index and
indexOut objects:
library(caret)
set.s
You have not shown all of your code and it is difficult to diagnose the
issue.
I assume that you are using the data from:
library(AppliedPredictiveModeling)
data(AlzheimerDisease)
If so, there is example code to analyze these data in that package. See
?scriptLocation.
We have no idea how
That is legacy code but there was a good reason back then.
caret is written to use parallel processing via the foreach package.
There were some cases where the worker processes did not load the
required packages (even when I used foreach's ".packages" argument) so
I would do it explicitly. I don't
You might look at the 'bag' function in the caret package. It will not
do the subsampling of variables at each split but you can bag a tree
and down-sample the data at each iteration. The help page has an
examples bagging ctree (although you might want to play with the tree
depth a little).
Max
O
On Fri, Feb 28, 2014 at 1:13 AM, zhenjiang zech xu
wrote:
> Dear all,
>
> I did a 5-repeat of 10-fold cross validation using partial least square
> regression model provided by caret package. Can anyone tell me how are the
> values in plsTune$resample calculated? Is that predicted on each hold-out
Michael,
On Mon, Feb 24, 2014 at 5:51 AM, Michael Haenlein
wrote:
>
> Dear all,
>
> I am working with a set of variables that are very non-normally
> distributed. To improve the performance of my model, I'm currently applying
> a boxcox transformation to them. While this improves things, the
> pe
I think that the fundamental problem is that you are using the default
value of ntree (500). You should always use at least 1500 and more if n or
p are large.
Also, this link will give you more up-to-date information on that package
and feature selection:
http://caret.r-forge.r-project.org/featur
Describing the problem would help a lot more. For example, if you were
using some of the parallel processing options in R, this can make extra
copies of objects and drive memory usage up very quickly.
Max
On Thu, Jan 2, 2014 at 3:35 PM, Ben Bolker wrote:
> Xebar Saram gmail.com> writes:
>
> >
If you are using the nnet package, the caret package has a variable
importance method based on Gevrey, M., Dimopoulos, I., & Lek, S. (2003).
Review and comparison of methods to study the contribution of variables in
artificial neural network models. Ecological Modelling, 160(3), 249-264. It
is base
Andrew,
> What I still don't quite understand is which accuracy values from train() I
> should trust: those using classProbs=T or classProbs=F?
It depends on whether you need the class probabilities and class
predictions to match (which they would if classProbs = TRUE).
Another option is to use
ecause the class designation takes
into account the costs but the class probability predictions do not. I
alerted both package maintainers to the issue some time ago.)
HTH,
Max
On Fri, Nov 15, 2013 at 1:56 PM, Max Kuhn wrote:
> I've looked into this a bit and the issue seems to be with c
There is a sub-object called 'rules' that has the output of C5.0 for this model:
> library(C50)
> mod <- C5.0(Species ~ ., data = iris, rules = TRUE)
> cat(mod$rules)
id="See5/C5.0 2.07 GPL Edition 2013-11-09"
entries="1"
rules="4" default="setosa"
conds="1" cover="50" ok="50" lift="2.94231" class
> How do i make a loop so that the process could be repeated several time,
> producing randomly ROC curve and under ROC values?
Using the caret package
http://caret.r-forge.r-project.org/
--
Max
__
R-help@r-project.org mailing list
https://stat.ethz
Katrina,
I made some changes to accomidate gbm's new feature for 3+ categories,
then had to "harmonize" how gbm and caret work together.
I have a new version of caret that is not released yet (maybe within a
month), but you should get it from:
install.packages("caret", repos="http://R-Forge.R
There isn't much out there. Quinlan didn't open source the code until about
a year ago.
I've been through the code line by line and we have a fairly descriptive
summary of the model in our book (that's almost out):
http://appliedpredictivemodeling.com/
I will say that the pruning is mostly the
Paul,
#1: I've never tried but you might be able to escape the required tags in
your text (e.g. in html you could write out the in your text).
#3: Which output? Is this in text?
#2: I may be possible and maybe easy to implement. So if you want to dig
into it, have at it. For me, I'm completely
See this:
https://code.google.com/p/gradientboostedmodels/issues/detail?id=3
and this:
https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel
Max
On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella wrote:
> Dear All,
> I am far from being a guru about parallel programm
James,
I did a fresh install from CRAN to get caret_5.15-61 and ran your code with
method.name = "nnet" and grid.len = 3.
I don't get an error, although there were issues:
In nominalTrainWorkflow(dat = trainData, info = trainInfo, ... :
There were missing values in resampled performance
rrelated but the prior equation seems different to me. Could you
> explain if this is the same concept?
>
> Charles
>
>
> On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn wrote:
>
>> > Is there some literature that you make that statement?
>>
>> No, but t
Charles,
You should not be treating the classes as numeric (is virginica really
three times setosa?). Q^2 and/or R^2 are not appropriate for classification.
Max
On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr wrote:
> I have discovered on of my errors. The timematrix was unnecessary and a
That's not a reproducible example. There is no sessionInfo() and you
omitted code (where did 'fp' come from?).
It works fine for me (see sessionInfo below) using the code in ?odfWeave.
As for the file paths: you can point to different paths for the files
(although don't change the working directo
to the method functions from
> each package other than those listed in the CARET documentation (e.g. I
> would like to specify sampsize and nodesize for randomForest, and not just
> mtry).
>
>
Yes. A custom method is how you do that.
> Thanks,
>
> James
>
>
>
>
>
James,
You really need to read the documentation. Almost every question that you
have has been addressed in the existing material. For this one, there is a
section on custom models here:
http://caret.r-forge.r-project.org/training.html
Max
On Wed, Feb 13, 2013 at 9:58 AM, James Jong wrote:
A reproducible example sent to the package maintainer(s)
might yield results.
Max
On Wed, Dec 19, 2012 at 7:47 AM, Ivana Cace wrote:
> Packages pROC and ROCR both calculate/approximate the Area Under (Receiver
> Operator) Curve. However the results are different.
>
> I am computing a new varia
7] reshape_0.8.4 plyr_1.7.1 lattice_0.20-10
>
> loaded via a namespace (and not attached):
> [1] codetools_0.2-8 compiler_2.15.2 grid_2.15.2 iterators_1.0.6
> tools_2.15.2
>
>
> Is there an example that shows a classProbs example, I could try to run it
> to replicate an
You didn't provide the results of sessionInfo().
Upgrade to the version just released on cran and see if you still have the
issue.
Max
On Thu, Nov 29, 2012 at 6:55 PM, Brian Feeny wrote:
> I have never been able to get class probabilities to work and I am
> relatively new to using these tools
Brian,
This is all outlined in the package documentation. The final model is fit
automatically. For example, using 'verboseIter' provides details. From
?train
> knnFit1 <- train(TrainData, TrainClasses,
+ method = "knn",
+ preProcess = c("center", "scale"),
+
Vik,
On Fri, Sep 21, 2012 at 12:42 PM, Vik Rubenfeld wrote:
> Max, I installed C50. I have a question about the syntax. Per the C50 manual:
>
> ## Default S3 method:
> C5.0(x, y, trials = 1, rules= FALSE,
> weights = NULL,
> control = C5.0Control(),
> costs = NULL, ...)
>
> ## S3 method for class
I can reproduce the errors. I'll take a look.
Thanks,
Max
On Thu, Jul 12, 2012 at 5:24 AM, Dominik Bruhn wrote:
> I want to use the caret package and found out about the timingSamps
> obtion to obtain the time which is needed to predict results. But, as
> soon as I set a value for this option,
Tyrell,
If you want to have the folds contain data from only one site at a
time, you can develop a set of row indices and pass these to the index
argument in trainControl. For example
index = list(site1 = c(1, 6, 8, 12), site2 = c(120, 152, 176, 178),
site3 = c(754, 789, 981))
The first fold
data$pred)^2)
> rSquare <- 1-(ssErr/ssTot)
>
> #Calculate MSE
> mse <- mean((data$pred - data$obs)^2)
>
> #Aggregate
> out <- c(sqrt(mse), 1-(ssErr/ssTot))
> names(out) <- c("RMSE", "Rsquared&
hod = method, :
> There were missing values in resampled performance measures.
> -
>
> As I didn't understand your post, I don't know if this confirms your
> assumption.
>
> Thanks anyway,
> Dominik
>
>
> On 16/05/12 17:30, Max Kuhn wrote:
>> M
failure mode would result in a divide by
zero.
Try using you own summary function (see ?trainControl) and put a
print(summary(data$pred)) in there to verify my claim.
Max
On Wed, May 16, 2012 at 11:30 AM, Max Kuhn wrote:
> More information is needed to be sure, but it is most likely that some
&
Matt,
> I've been using a custom summary function to optimise regression model
> methods using the caret package. This has worked smoothly. I've been using
> the default bootstrapping resampling method. For bagging models
> (specifically randomForest in this case) caret can, in theory, uses the
>
Can anyone recommend a good nonparametric density approach for data bounded
(say between 0 and 1)?
For example, using the basic Gaussian density approach doesn't generate a
very realistic shape (nor should it):
> set.seed(1)
> dat <- rbeta(100, 1, 2)
> plot(density(dat))
(note the area outside o
I think you need to read the man pages and the four vignettes. A lot
of your questions have answers there.
If you don't specify the resampling indices, they ones generated for
you are saved in the train object:
> data(iris)
> TrainData <- iris[,1:4]
> TrainClasses <- iris[,5]
>
> knnFit1 <- train
You can adjust the candidate set of tuning parameters via the tuneGrid
argument in trian() and the process by which the optimal choice is
made (via the 'selectionFunction' argument in trainControl()). Check
out the package vignettes.
The latest version also has an update.train() function that lets
Somewhere I've seen an example of an xyplot() where the key was placed
in a location of a missing panel. For example, if there were 3
conditioning levels, the panel grid would look like:
34
12
In this (possibly imaginary) example, there were scatter plots in
locations 1:3 and location 4 had no co
Yes, I was aware of the different type and their respective prevalences.
The dichromat package helped me find what I needed.
Thanks,
Max
On Wed, Nov 2, 2011 at 6:38 PM, Thomas Lumley wrote:
> On Thu, Nov 3, 2011 at 11:04 AM, Carl Witthoft wrote:
>>
>> Before you pick out a palette: you are a
Everyone,
I'm working with scatter plots with different colored symbols (via
lattice). I'm currently using these colors for points and lines:
col1 <- c(rgb(1, 0, 0), rgb(0, 0, 1),
rgb(0, 1, 0),
rgb(0.55482458, 0.40350876, 0.0416),
rgb(0, 0, 0))
plot(seq(along = col1
This is failing because it is a saturated model and the contrast
package tries to do a t-test (instead of a z test). I can add code to
do this, but it will take a few days.
Max
On Fri, Oct 28, 2011 at 2:16 PM, John Sorkin
wrote:
> Forgive my resending this post. To data I have received only one
I'm not sure what you mean by full code or the iteration. This uses
foreach to parallelize the loops over different tuning parameters and
resampled data sets.
The only way I could set to split up the parallelism is if you are
fitting different models to the same data. In that case, you could
launc
I have had issues with some parallel backends not finding functions
within a namespace for packages listed in the ".packages" argument or
explicitly loaded in the body of the foreach loop. This has occurred
with MPI but not with multicore. I can get around this to some extent
by calling the functio
Mon, Oct 3, 2011 at 11:10 AM, wrote:
> Hi Max,
>
> Thanks for the note. In your last paragraph, did you mean "in
> createDataPartition"? I'm a little vague about what returnTrain option does.
>
> Bonnie
>
> Quoting Max Kuhn :
>
>> Basically, create
Basically, createDataPartition is used when you need to make one or
more simple two-way splits of your data. For example, if you want to
make a training and test set and keep your classes balanced, this is
what you could use. It can also make multiple splits of this kind (or
leave-group-out CV aka
tion. Could you perhaps tell me which example I should have a look at?
>
> Regards,
> Jan
>
>
>
> On 09/15/2011 04:47 PM, Max Kuhn wrote:
>>
>> There are examples in the package directory that explain this.
>>
>> On Thu, Sep 15, 2011 at 8:16 AM, Jan van de
There are examples in the package directory that explain this.
On Thu, Sep 15, 2011 at 8:16 AM, Jan van der Laan wrote:
>
> What is the correct way to combine multiple calls to odfCat, odfItemize,
> odfTable etc. inside a function?
>
> As an example lets say I have a function that needs to write
Can you provide a reproducible example and the results of
sessionInfo()? What are the levels of your classes?
On Sat, Aug 27, 2011 at 10:43 PM, Jon Toledo wrote:
>
> Dear developers,
> I have jutst started working with caret and all the nice features it offers.
> But I just encountered a problem
David,
The ROC curve should really be computed with some sort of numeric data
(as opposed to classes). It varies the cutoff to get a continuum of
sensitivity and specificity values. Using the classes as 1's and 2's
implies that the second class is twice the value of the first, which
doesn't reall
t; similar. I sent the info to Max Kuhn privately, but did not get a response
> after two tries.) My odfWeave reporting system worked fine prior to R2.12 and
> then the same code that ran fine under R2.11.1 stopped working. Using the
> very same machine and running the very same code unde
Frank,
It depends on how you define "optimal". While I'm not a big fan of
using the area under the ROC to characterize performance, there are a
lot of times when likelihood measures are clearly sub-optimal in
performance. Using resampled accuracy (or Kappa) instead of deviance
(out-of-bag or not)
XiaoLiu,
I can't see the options in bootControl you used here. Your error is
consistent with leaving classProbs and summaryFunction unspecified.
Please double check that you set them with classProbs = TRUE and
summaryFunction = twoClassSummary before you ran.
Max
On Thu, May 12, 2011 at 7:04 PM,
As far as caret goes, you should read
http://cran.r-project.org/web/packages/caret/vignettes/caretVarImp.pdf
and look at rfe() and sbf().
On Fri, May 6, 2011 at 2:53 PM, ypriverol wrote:
> Thanks Max. I'm using now the library caret with my data. But the models
> showed a correlation under
train() uses vectors, matrices and data frames as input. I really
think you need to read materials on basic R before proceeding. Go to
the R web page. There are introductory materials there.
On Tue, May 3, 2011 at 11:19 AM, ypriverol wrote:
> I saw the format of the caret data some days ago. It i
See the examples at the end of:
http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf
for a QSAR data set for modeling the log blood-brain barrier
concentration. SVMs are not used there but, if you use train(), the
syntax is very similar.
On Tue, May 3, 2011 at 9:38 AM, yprive
Yeah, that didn't work. Use
fitControl<-trainControl(index = list(seq(along = mdrrClass)))
See ?trainControl to understand what this does in detail.
Max
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do rea
Not all modeling functions have both the formula and "matrix"
interface. For example, glm() and rpart() only have formula method,
enet() has only the matrix interface and ksvm() and others have both.
This was one reason I created the package (so we don't have to
remember all this).
train() lets yo
No, the sampling is done on rows. The definition of a bootstrap
(re)sample is one which is the same size as the original data but
taken with replacement. The "Accuracy SD" and "Kappa SD" columns give
you a sense of how the model performance varied across these bootstrap
data sets (i.e. they are not
When you say "variable" do you mean predictors or responses?
In either case, they do. You can generally tell by reading the help
files and looking at the examples.
Max
On Fri, Apr 29, 2011 at 3:47 PM, ypriverol wrote:
> Hi:
> I'm starting a research of Support Vector Regression. I want to obta
It isn't building the same model since each fit is created from
different data sets.
The resampling is sort of the point of the function, but if you really
want to avoid it, supply your own index in trainControl that has every
index (eg, index = seq(along = mdrrClass)). In this case, the
performan
I don't think that this is the issue, but test it on a file without spaces.
On Mon, Mar 21, 2011 at 2:25 PM, wrote:
>
> I have a very similar error that cropped up when I upgraded to R 2.12 and
> persists at R 2.12.1. I am running R on Windows XP and OO is at version 3.2.
> I did not make any
> Using the 'CARET' package, is it possible to specify weights for features
> used in model prediction?
For what model?
> And for the 'knn' implementation, is there a way
> to choose a distance metric (i.e. Mahalanobis distance)?
>
No, sorry.
Max
__
It would help if you provided the code that you used for the caret functions.
The most likely issues is not using importance = TRUE in the call to train()
I believe that I've only implemented code for plotting the varImp
objects resulting from train() (eg. there is plot.varImp.train but not
plot.
R for Predictive Modeling: A Hands-On Introduction
Predictive Analytics World in San Francisco
Sunday March 13, 9am to 4:30pm
This one-day session provides a hands-on introduction to R, the
well-known open-source platform for data analysis. Real examples are
employed in order to methodically expo
The objects functions for kernel methods are unrelated to the area
under the ROC curve. However, you can try to choose the cost and
kernel parameters to maximize the ROC AUC.
See the caret package, specifically the train function.
Max
On Mon, Feb 21, 2011 at 5:34 PM, Angel Russo wrote:
> *Hi,
>
> I am using randomForest package to do some prediction job on GWAS data. I
> firstly split the data into training and testing set (70% vs 30%), then
> using training set to grow the trees (ntree=10). It looks that the OOB
> error in training set is good (<10%). However, it is not very good for
Andrew,
ctree only tunes over mincriterion and ctree2 tunes over maxdepth
(while fixing mincriterion = 0).
Seeing both listed as the function is being executed is a bug. I'll
setup checks to make sure that the columns specified in tuneGrid are
actually the tuning parameters that are used.
Max
O
No. Any valid seed should work. In this case, train() should on;y be
using it to determine which training set samples are in the CV or
bootstrap data sets.
Max
On Wed, Jan 26, 2011 at 9:56 AM, Neeti wrote:
>
> Thank you so much for your reply. In my case it is giving error in some seed
> value f
Sort of. It lets you define a grid of candidate values to test and to
define the rule to choose the best. For some models, it is each to
come up with default values that work well (e.g. RBF SVM's, PLS, KNN)
while others are more data dependent. In the latter case, the defaults
may not work well.
M
What version of caret and R? We'll also need a reproducible example.
On Mon, Jan 24, 2011 at 12:44 PM, Neeti wrote:
>
> Hi,
> I am trying to construct a svmpoly model using the "caret" package (please
> see code below). Using the same data, without changing any setting, I am
> just changing the
splom(~dat, groups = grps,
lower.panel = panel.circ3,
upper.panel = panel.circ3)
Thanks,
Max
On Thu, Jan 20, 2011 at 11:13 AM, Peter Ehlers wrote:
> On 2011-01-19 20:15, Max Kuhn wrote:
>>
>> Hello everyone,
>>
>> I'm stumped. I'd like to create
Hello everyone,
I'm stumped. I'd like to create a scatterplot matrix with circular
reference lines. Here is an example in 2d:
library(ellipse)
set.seed(1)
dat <- matrix(rnorm(300), ncol = 3)
colnames(dat) <- c("X1", "X2", "X3")
dat <- as.data.frame(dat)
grps <- factor(rep(letters[1:4], 25))
pan
I'd like to make a less than full rank design using dummy variables
for factors. Here is some example data:
when <- data.frame(time = c("afternoon", "night", "afternoon",
"morning", "morning", "morning",
"morning", "afternoon", "afternoon"),
Neeti,
I'm pretty sure that the error is related to the confusionMAtrix call,
which is in the caret package, not e1071.
The error message is pretty clear: you need to pas in two factor
objects that have the same levels. You can check by running the
commands:
str(pred_true1)
str(species_tes
Kendric,
I've seen these too and traceback() usually goes back to ksvm(). This
doesn't mean that the error is there, but the results fo traceback()
from you would be helpful.
thanks,
Max
On Mon, Nov 22, 2010 at 6:18 PM, Kendric Wang
wrote:
> Hi. I am trying to construct a svmLinear model using
Can you try it with version 7.16 on R-Forge? Use
install.packages("odfWeave", repos="http://R-Forge.R-project.org";)
to get it.
Thanks,
Max
On Tue, Nov 16, 2010 at 8:26 AM, Søren Højsgaard
wrote:
> Dear Mike,
>
> Good point - thanks. The lines that caused the error mentioned above are
> sim
> The caret package has answers to all your questions.
>> 1) How to obtain a variable (attribute) importance using
>> e1071:SVM (or other
>> svm methods)?
I haven't implemented a model-specific method for variables importance
for SVM models. I know of one package (svmpath) that will return the
re
Ravishankar,
> I used Random Forest with a couple of data sets I had to predict for binary
> response. In all the cases, the AUC of the training set is coming to be 1.
> Is this always the case with random forests? Can someone please clarify
> this?
This is pretty typical for this model.
> I hav
These two resources might also help:
http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf
http://cran.r-project.org/web/packages/contrast/vignettes/contrast.pdf
Max
On Thu, Sep 30, 2010 at 1:33 PM, Ista Zahn wrote:
> Hi Professor Howell,
> I think the issue here is simply in the assumpt
You might want to check out the Reproducible Research task view:
http://cran.r-project.org/web/views/ReproducibleResearch.html
There is a section on Microsoft formats, as well as other formats that
can be converted.
Max
On Wed, Sep 15, 2010 at 11:49 AM, Thomas Lumley
wrote:
> On Wed, 15 S
Trafim,
You'll get more answers if you adhere to the posting guide and tell us
you version information and other necessary details. For example, this
function is in the caret package (but nobody but me probably knows
that =]).
The first argument should be a vector of outcome values (not the
possi
A Reproducible Research CRAN task view was recently created:
http://cran.r-project.org/web/views/ReproducibleResearch.html
I will be updating it with some of the information in this thread.
thanks,
Max
On Thu, Sep 9, 2010 at 11:41 AM, Matt Shotwell wrote:
> Well, the attachment was a dud
Ben,
> 1a. am I right in believing that odfWeave does not respect the
> 'keep.source' option? Am I missing something obvious?
I believe it does, since this gets passed directly to Sweave.
> 1b. is there a way to set global options analogous to \SweaveOpts{}
> directives in Sweave? (I looked a
> What does this mean?
It's impossible to tell. Read the posting guide and figure out all the
details that you left out. If we don't have more information, you
should have low expectations about the quality of any replies to might
get.
--
Max
__
R-he
The index indicates which samples should go into the training set.
However, you are using out of bag sampling, so it would use the whole
training set and return the OOB error (instead of the error estimates
that would be produced by resampling via the index).
Which do you want? OOB estimates or ot
Not to beat a dead horse...
I've found that I like the useR conferences more than most statistics
conferences. This isn't due to the difference in content, but the
difference in the audience and the environment.
For example, everyone is at useR because of their appreciation of R.
At most other co
1 - 100 of 245 matches
Mail list logo