[R] failure with merge

2016-07-14 Thread Max Kuhn
I am merging two data frames: tuneAcc <- structure(list(select = c(FALSE, TRUE), method = structure(c(1L, 1L), .Label = "GCV.Cp", class = "factor"), RMSE = c(29.2102056093962, 28.9743318817886), Rsquared = c(0.0322612161559773, 0.0281713457306074), RMSESD = c(0.981573768028697, 0.791307778398384),

Re: [R] Installing Caret

2016-06-16 Thread Max Kuhn
The problem is not with `caret. Your output says: > installation of package ‘minqa’ had non-zero exit status `caret` has a dependency that has a dependency on `minqa`. The same is true for `RcppEigen` and the others. What code did you use to do the install? What OS and version or R etc? On Th

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Max Kuhn
anks and > > > Kind Regards > > > > -- > Muhammad Bilal > Research Fellow and Doctoral Researcher, > Bristol Enterprise, Research, and Innovation Centre (BERIC), > University of the West of England (UWE), > Frenchay Campus, > Bristol, > BS16 1QY > > *muh

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Max Kuhn
It is extremely difficult to tell what the issue might be without a reproducible example. The only thing that I can suggest is to use the non-formula interface to `train` so that you can avoid creating dummy variables. On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal < muhammad2.bi...@live.uwe.ac.

Re: [R] Mixture Discriminant Analysis and Penalized LDA

2016-01-25 Thread Max Kuhn
There is a function called `smda` in the sparseLDA package that implements the model described in Clemmensen, L., Hastie, T., Witten, D. and Ersbøll, B. Sparse discriminant analysis, Technometrics, 53(4): 406-413, 2011 Max On Sun, Jan 24, 2016 at 10:45 PM, TJUN KIAT TEO wrote: > Hi > > I notice

Re: [R] Caret - Recursive Feature Elimination Error

2015-12-23 Thread Max Kuhn
Providing a reproducible example and the results of `sessionInfo` will help get your question answered. Also, what is the point of using glmnet with RFE? It already does feature selection. On Wed, Dec 23, 2015 at 1:48 AM, Manish MAHESHWARI wrote: > Hi, > > I am trying to use caret, for feature

Re: [R] Error in 'Contrasts<-' while using GBM.

2015-11-29 Thread Max Kuhn
Providing a reproducible example and the results of `sessionInfo` will help get your question answered. My only guess is that one or more of your predictors are factors and that the in-sample data (used to build the model during resampling) have different levels than the holdout samples. Max On

Re: [R] Ensure distribution of classes is the same as prior distribution in Cross Validation

2015-11-24 Thread Max Kuhn
Right now, using `method = "cv"` or `method = "repeatedcv"` does stratified sampling. Depending on what you mean by "ensure" and the nature of your outcome (categorical?), it probably already does. On Mon, Nov 23, 2015 at 7:04 PM, TJUN KIAT TEO wrote: > In the caret train control function, is it

Re: [R] Caret Internal Data Representation

2015-11-06 Thread Max Kuhn
Providing a reproducible example and the results of `sessionInfo` will help get your question answered. For example, did you use the formula or non-formula interface to `train` and so on On Thu, Nov 5, 2015 at 1:10 PM, Bert Gunter wrote: > I am not familiar with caret/Cubist, but assuming they

Re: [R] Imbalanced random forest

2015-07-29 Thread Max Kuhn
This might help: http://bit.ly/1MUP0Lj On Wed, Jul 29, 2015 at 11:00 AM, jpara3 wrote: > ¿How can i set up a study with random forest where the response is highly > imbalanced? > > > > - > > Guided Tours Basque Country > > Guided tours in the three capitals of the Basque Country: Bilbao, >

Re: [R] what constitutes a 'complete sentence'?

2015-07-07 Thread Max Kuhn
On Tue, Jul 7, 2015 at 8:19 AM, John Fox wrote: > Dear Peter, > > You're correct that these examples aren't verb phrases (though the second > one contains a verb phrase). I don't want to make the discussion even more > pedantic (moving it in this direction was my fault), but "Paragraph" isn't > q

Re: [R] Caret and custom summary function

2015-05-11 Thread Max Kuhn
The version of caret just put on CRAN has a function called mnLogLoss that does this. Max On Mon, May 11, 2015 at 11:17 AM, Lorenzo Isella wrote: > Dear All, > I am trying to implement my own metric (a log loss metric) for a > binary classification problem in Caret. > I must be making some mist

Re: [R] Repeated failures to install "caret" package (of Max Kuhn)

2015-04-04 Thread Max Kuhn
gt; > > -Original Message- > > From: wyl...@ischool.utexas.edu > > Sent: Fri, 03 Apr 2015 16:07:57 -0500 > > To: r-help@r-project.org > > Subject: [R] Repeated failures to install "caret" package (of Max Kuhn) > > > > For an edx course,

Re: [R] #library("CHAID") - Cross validation for chaid

2015-01-05 Thread Max Kuhn
You can create your own: http://topepo.github.io/caret/custom_models.html I put a prototype together. Source this file: https://github.com/topepo/caret/blob/master/models/files/chaid.R then try this: library("CHAID") ### fit tree to subsample set.seed(290875) USvoteS <- USvote[sample(1:

Re: [R] Help with caret, please

2014-10-11 Thread Max Kuhn
What you are asking is a bad idea on multiple levels. You will grossly over-estimate the area under the ROC curve. Consider the 1-NN model: you will have perfect predictions every time. To do this, you will need to run train again and modify the index and indexOut objects: library(caret) set.s

Re: [R] Training a model using glm

2014-09-17 Thread Max Kuhn
You have not shown all of your code and it is difficult to diagnose the issue. I assume that you are using the data from: library(AppliedPredictiveModeling) data(AlzheimerDisease) If so, there is example code to analyze these data in that package. See ?scriptLocation. We have no idea how

Re: [R] Use of library(X) in the code of library X.

2014-06-06 Thread Max Kuhn
That is legacy code but there was a good reason back then. caret is written to use parallel processing via the foreach package. There were some cases where the worker processes did not load the required packages (even when I used foreach's ".packages" argument) so I would do it explicitly. I don't

Re: [R] cforest sampling methods

2014-03-19 Thread Max Kuhn
You might look at the 'bag' function in the caret package. It will not do the subsampling of variables at each split but you can bag a tree and down-sample the data at each iteration. The help page has an examples bagging ctree (although you might want to play with the tree depth a little). Max O

Re: [R] how is the model resample performance calculated by caret?

2014-02-28 Thread Max Kuhn
On Fri, Feb 28, 2014 at 1:13 AM, zhenjiang zech xu wrote: > Dear all, > > I did a 5-repeat of 10-fold cross validation using partial least square > regression model provided by caret package. Can anyone tell me how are the > values in plsTune$resample calculated? Is that predicted on each hold-out

Re: [R] boxcox alternative

2014-02-24 Thread Max Kuhn
Michael, On Mon, Feb 24, 2014 at 5:51 AM, Michael Haenlein wrote: > > Dear all, > > I am working with a set of variables that are very non-normally > distributed. To improve the performance of my model, I'm currently applying > a boxcox transformation to them. While this improves things, the > pe

Re: [R] Predictor Importance in Random Forests and bootstrap

2014-01-28 Thread Max Kuhn
I think that the fundamental problem is that you are using the default value of ntree (500). You should always use at least 1500 and more if n or p are large. Also, this link will give you more up-to-date information on that package and feature selection: http://caret.r-forge.r-project.org/featur

Re: [R] R crashes with memory errors on a 256GB machine (and system shoes only 60GB usage)

2014-01-02 Thread Max Kuhn
Describing the problem would help a lot more. For example, if you were using some of the parallel processing options in R, this can make extra copies of objects and drive memory usage up very quickly. Max On Thu, Jan 2, 2014 at 3:35 PM, Ben Bolker wrote: > Xebar Saram gmail.com> writes: > > >

Re: [R] Variable importance - ANN

2013-12-04 Thread Max Kuhn
If you are using the nnet package, the caret package has a variable importance method based on Gevrey, M., Dimopoulos, I., & Lek, S. (2003). Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling, 160(3), 249-264. It is base

Re: [R] Inconsistent results between caret+kernlab versions

2013-11-17 Thread Max Kuhn
Andrew, > What I still don't quite understand is which accuracy values from train() I > should trust: those using classProbs=T or classProbs=F? It depends on whether you need the class probabilities and class predictions to match (which they would if classProbs = TRUE). Another option is to use

Re: [R] Inconsistent results between caret+kernlab versions

2013-11-15 Thread Max Kuhn
ecause the class designation takes into account the costs but the class probability predictions do not. I alerted both package maintainers to the issue some time ago.) HTH, Max On Fri, Nov 15, 2013 at 1:56 PM, Max Kuhn wrote: > I've looked into this a bit and the issue seems to be with c

Re: [R] C50 Node Assignment

2013-11-09 Thread Max Kuhn
There is a sub-object called 'rules' that has the output of C5.0 for this model: > library(C50) > mod <- C5.0(Species ~ ., data = iris, rules = TRUE) > cat(mod$rules) id="See5/C5.0 2.07 GPL Edition 2013-11-09" entries="1" rules="4" default="setosa" conds="1" cover="50" ok="50" lift="2.94231" class

Re: [R] Cross validation in R

2013-07-02 Thread Max Kuhn
> How do i make a loop so that the process could be repeated several time, > producing randomly ROC curve and under ROC values? Using the caret package http://caret.r-forge.r-project.org/ -- Max __ R-help@r-project.org mailing list https://stat.ethz

Re: [R] Error running caret's gbm train function with new version of caret

2013-05-06 Thread Max Kuhn
Katrina, I made some changes to accomidate gbm's new feature for 3+ categories, then had to "harmonize" how gbm and caret work together. I have a new version of caret that is not released yet (maybe within a month), but you should get it from: install.packages("caret", repos="http://R-Forge.R

Re: [R] C50 package in R

2013-04-26 Thread Max Kuhn
There isn't much out there. Quinlan didn't open source the code until about a year ago. I've been through the code line by line and we have a fairly descriptive summary of the model in our book (that's almost out): http://appliedpredictivemodeling.com/ I will say that the pruning is mostly the

Re: [R] odfWeave: Some questions about potential formatting options

2013-04-17 Thread Max Kuhn
Paul, #1: I've never tried but you might be able to escape the required tags in your text (e.g. in html you could write out the in your text). #3: Which output? Is this in text? #2: I may be possible and maybe easy to implement. So if you want to dig into it, have at it. For me, I'm completely

Re: [R] Parallelizing GBM

2013-03-24 Thread Max Kuhn
See this: https://code.google.com/p/gradientboostedmodels/issues/detail?id=3 and this: https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel Max On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella wrote: > Dear All, > I am far from being a guru about parallel programm

Re: [R] CARET and NNET fail to train a model when the input is high dimensional

2013-03-06 Thread Max Kuhn
James, I did a fresh install from CRAN to get caret_5.15-61 and ran your code with method.name = "nnet" and grid.len = 3. I don't get an error, although there were issues: In nominalTrainWorkflow(dat = trainData, info = trainInfo, ... : There were missing values in resampled performance

Re: [R] caret pls model statistics

2013-03-03 Thread Max Kuhn
rrelated but the prior equation seems different to me. Could you > explain if this is the same concept? > > Charles > > > On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn wrote: > >> > Is there some literature that you make that statement? >> >> No, but t

Re: [R] caret pls model statistics

2013-03-02 Thread Max Kuhn
Charles, You should not be treating the classes as numeric (is virginica really three times setosa?). Q^2 and/or R^2 are not appropriate for classification. Max On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr wrote: > I have discovered on of my errors. The timematrix was unnecessary and a

Re: [R] odfWeave: Trouble Getting the Package to Work

2013-02-18 Thread Max Kuhn
That's not a reproducible example. There is no sessionInfo() and you omitted code (where did 'fp' come from?). It works fine for me (see sessionInfo below) using the code in ?odfWeave. As for the file paths: you can point to different paths for the files (although don't change the working directo

Re: [R] CARET: Any way to access other tuning parameters?

2013-02-13 Thread Max Kuhn
to the method functions from > each package other than those listed in the CARET documentation (e.g. I > would like to specify sampsize and nodesize for randomForest, and not just > mtry). > > Yes. A custom method is how you do that. > Thanks, > > James > > > > >

Re: [R] CARET: Any way to access other tuning parameters?

2013-02-13 Thread Max Kuhn
James, You really need to read the documentation. Almost every question that you have has been addressed in the existing material. For this one, there is a section on custom models here: http://caret.r-forge.r-project.org/training.html Max On Wed, Feb 13, 2013 at 9:58 AM, James Jong wrote:

Re: [R] pROC and ROCR give different values for AUC

2012-12-19 Thread Max Kuhn
A reproducible example sent to the package maintainer(s) might yield results. Max On Wed, Dec 19, 2012 at 7:47 AM, Ivana Cace wrote: > Packages pROC and ROCR both calculate/approximate the Area Under (Receiver > Operator) Curve. However the results are different. > > I am computing a new varia

Re: [R] Help with this error "kernlab class probability calculations failed; returning NAs"

2012-11-29 Thread Max Kuhn
7] reshape_0.8.4 plyr_1.7.1 lattice_0.20-10 > > loaded via a namespace (and not attached): > [1] codetools_0.2-8 compiler_2.15.2 grid_2.15.2 iterators_1.0.6 > tools_2.15.2 > > > Is there an example that shows a classProbs example, I could try to run it > to replicate an

Re: [R] Help with this error "kernlab class probability calculations failed; returning NAs"

2012-11-29 Thread Max Kuhn
You didn't provide the results of sessionInfo(). Upgrade to the version just released on cran and see if you still have the issue. Max On Thu, Nov 29, 2012 at 6:55 PM, Brian Feeny wrote: > I have never been able to get class probabilities to work and I am > relatively new to using these tools

Re: [R] caret train and trainControl

2012-11-23 Thread Max Kuhn
Brian, This is all outlined in the package documentation. The final model is fit automatically. For example, using 'verboseIter' provides details. From ?train > knnFit1 <- train(TrainData, TrainClasses, + method = "knn", + preProcess = c("center", "scale"), +

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-22 Thread Max Kuhn
Vik, On Fri, Sep 21, 2012 at 12:42 PM, Vik Rubenfeld wrote: > Max, I installed C50. I have a question about the syntax. Per the C50 manual: > > ## Default S3 method: > C5.0(x, y, trials = 1, rules= FALSE, > weights = NULL, > control = C5.0Control(), > costs = NULL, ...) > > ## S3 method for class

Re: [R] Caret: Use timingSamps leads to error

2012-07-12 Thread Max Kuhn
I can reproduce the errors. I'll take a look. Thanks, Max On Thu, Jul 12, 2012 at 5:24 AM, Dominik Bruhn wrote: > I want to use the caret package and found out about the timingSamps > obtion to obtain the time which is needed to predict results. But, as > soon as I set a value for this option,

Re: [R] caret() train based on cross validation - split dataset to keep sites together?

2012-05-30 Thread Max Kuhn
Tyrell, If you want to have the folds contain data from only one site at a time, you can develop a set of row indices and pass these to the index argument in trainControl. For example index = list(site1 = c(1, 6, 8, 12), site2 = c(120, 152, 176, 178), site3 = c(754, 789, 981)) The first fold

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-17 Thread Max Kuhn
data$pred)^2) >         rSquare <- 1-(ssErr/ssTot) > >         #Calculate MSE >         mse <- mean((data$pred - data$obs)^2) > >         #Aggregate >         out <- c(sqrt(mse), 1-(ssErr/ssTot)) >         names(out) <- c("RMSE", "Rsquared&

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Max Kuhn
hod = method,  : >  There were missing values in resampled performance measures. > - > > As I didn't understand your post, I don't know if this confirms your > assumption. > > Thanks anyway, > Dominik > > > On 16/05/12 17:30, Max Kuhn wrote: >> M

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Max Kuhn
failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Wed, May 16, 2012 at 11:30 AM, Max Kuhn wrote: > More information is needed to be sure, but it is most likely that some &

Re: [R] caret package: custom summary function in trainControl doesn't work with oob?

2012-04-13 Thread Max Kuhn
Matt, > I've been using a custom summary function to optimise regression model > methods using the caret package. This has worked smoothly. I've been using > the default bootstrapping resampling method. For bagging models > (specifically randomForest in this case) caret can, in theory, uses the >

[R] nonparametric densities for bounded distributions

2012-03-09 Thread Max Kuhn
Can anyone recommend a good nonparametric density approach for data bounded (say between 0 and 1)? For example, using the basic Gaussian density approach doesn't generate a very realistic shape (nor should it): > set.seed(1) > dat <- rbeta(100, 1, 2) > plot(density(dat)) (note the area outside o

Re: [R] Custom caret metric based on prob-predictions/rankings

2012-02-10 Thread Max Kuhn
I think you need to read the man pages and the four vignettes. A lot of your questions have answers there. If you don't specify the resampling indices, they ones generated for you are saved in the train object: > data(iris) > TrainData <- iris[,1:4] > TrainClasses <- iris[,5] > > knnFit1 <- train

Re: [R] Choosing glmnet lambda values via caret

2012-02-09 Thread Max Kuhn
You can adjust the candidate set of tuning parameters via the tuneGrid argument in trian() and the process by which the optimal choice is made (via the 'selectionFunction' argument in trainControl()). Check out the package vignettes. The latest version also has an update.train() function that lets

[R] lattice key in blank panel

2011-12-15 Thread Max Kuhn
Somewhere I've seen an example of an xyplot() where the key was placed in a location of a missing panel. For example, if there were 3 conditioning levels, the panel grid would look like: 34 12 In this (possibly imaginary) example, there were scatter plots in locations 1:3 and location 4 had no co

Re: [R] palettes for the color-blind

2011-11-02 Thread Max Kuhn
Yes, I was aware of the different type and their respective prevalences. The dichromat package helped me find what I needed. Thanks, Max On Wed, Nov 2, 2011 at 6:38 PM, Thomas Lumley wrote: > On Thu, Nov 3, 2011 at 11:04 AM, Carl Witthoft wrote: >> >> Before you pick out a palette:  you are a

[R] palettes for the color-blind

2011-11-02 Thread Max Kuhn
Everyone, I'm working with scatter plots with different colored symbols (via lattice). I'm currently using these colors for points and lines: col1 <- c(rgb(1, 0, 0), rgb(0, 0, 1), rgb(0, 1, 0), rgb(0.55482458, 0.40350876, 0.0416), rgb(0, 0, 0)) plot(seq(along = col1

Re: [R] Contrasts with an interaction. How does one specify the dummy variables for the interaction

2011-10-31 Thread Max Kuhn
This is failing because it is a saturated model and the contrast package tries to do a t-test (instead of a z test). I can add code to do this, but it will take a few days. Max On Fri, Oct 28, 2011 at 2:16 PM, John Sorkin wrote: > Forgive my resending this post. To data I have received only one

Re: [R] help with parallel processing code

2011-10-31 Thread Max Kuhn
I'm not sure what you mean by full code or the iteration. This uses foreach to parallelize the loops over different tuning parameters and resampled data sets. The only way I could set to split up the parallelism is if you are fitting different models to the same data. In that case, you could launc

Re: [R] help with parallel processing code

2011-10-27 Thread Max Kuhn
I have had issues with some parallel backends not finding functions within a namespace for packages listed in the ".packages" argument or explicitly loaded in the body of the foreach loop. This has occurred with MPI but not with multicore. I can get around this to some extent by calling the functio

Re: [R] difference between createPartition and createfold functions

2011-10-03 Thread Max Kuhn
Mon, Oct 3, 2011 at 11:10 AM, wrote: > Hi Max, > > Thanks for the note. In your last paragraph, did you mean "in > createDataPartition"? I'm a little vague about what returnTrain option does. > > Bonnie > > Quoting Max Kuhn : > >> Basically, create

Re: [R] difference between createPartition and createfold functions

2011-10-02 Thread Max Kuhn
Basically, createDataPartition is used when you need to make one or more simple two-way splits of your data. For example, if you want to make a training and test set and keep your classes balanced, this is what you could use. It can also make multiple splits of this kind (or leave-group-out CV aka

Re: [R] odfWeave: Combining multiple output statements in a function

2011-09-16 Thread Max Kuhn
tion. Could you perhaps tell me which example I should have a look at? > > Regards, > Jan > > > > On 09/15/2011 04:47 PM, Max Kuhn wrote: >> >> There are examples in the package directory that explain this. >> >> On Thu, Sep 15, 2011 at 8:16 AM, Jan van de

Re: [R] odfWeave: Combining multiple output statements in a function

2011-09-15 Thread Max Kuhn
There are examples in the package directory that explain this. On Thu, Sep 15, 2011 at 8:16 AM, Jan van der Laan wrote: > > What is the correct way to combine multiple calls to odfCat, odfItemize, > odfTable etc. inside a function? > > As an example lets say I have a function that needs to write

Re: [R] Trying to extract probabilities in CARET (caret) package with a glmStepAIC model

2011-08-28 Thread Max Kuhn
Can you provide a reproducible example and the results of sessionInfo()? What are the levels of your classes? On Sat, Aug 27, 2011 at 10:43 PM, Jon Toledo wrote: > > Dear developers, > I have jutst started working with caret and all the nice features it offers. > But I just encountered a problem

Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED]

2011-06-01 Thread Max Kuhn
David, The ROC curve should really be computed with some sort of numeric data (as opposed to classes). It varies the cutoff to get a continuum of sensitivity and specificity values.  Using the classes as 1's and 2's implies that the second class is twice the value of the first, which doesn't reall

Re: [R] issue with odfWeave running on Windows XP; question about installing packages under Linux

2011-05-18 Thread Max Kuhn
t; similar. I sent the info to Max Kuhn privately, but did not get a response > after two tries.) My odfWeave reporting system worked fine prior to R2.12 and > then the same code that ran fine under R2.11.1 stopped working. Using the > very same machine and running the very same code unde

Re: [R] Can ROC be used as a metric for optimal model selection for randomForest?

2011-05-13 Thread Max Kuhn
Frank, It depends on how you define "optimal". While I'm not a big fan of using the area under the ROC to characterize performance, there are a lot of times when likelihood measures are clearly sub-optimal in performance. Using resampled accuracy (or Kappa) instead of deviance (out-of-bag or not)

Re: [R] Can ROC be used as a metric for optimal model selection for randomForest?

2011-05-13 Thread Max Kuhn
XiaoLiu, I can't see the options in bootControl you used here. Your error is consistent with leaving classProbs and summaryFunction unspecified. Please double check that you set them with classProbs = TRUE and summaryFunction = twoClassSummary before you ran. Max On Thu, May 12, 2011 at 7:04 PM,

Re: [R] Bigining with a Program of SVR

2011-05-07 Thread Max Kuhn
As far as caret goes, you should read http://cran.r-project.org/web/packages/caret/vignettes/caretVarImp.pdf and look at rfe() and sbf(). On Fri, May 6, 2011 at 2:53 PM, ypriverol wrote: > Thanks Max. I'm using now the library caret with my data. But the models > showed a correlation under

Re: [R] Bigining with a Program of SVR

2011-05-04 Thread Max Kuhn
train() uses vectors, matrices and data frames as input. I really think you need to read materials on basic R before proceeding. Go to the R web page. There are introductory materials there. On Tue, May 3, 2011 at 11:19 AM, ypriverol wrote: > I saw the format of the caret data some days ago. It i

Re: [R] Bigining with a Program of SVR

2011-05-03 Thread Max Kuhn
See the examples at the end of: http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf for a QSAR data set for modeling the log blood-brain barrier concentration. SVMs are not used there but, if you use train(), the syntax is very similar. On Tue, May 3, 2011 at 9:38 AM, yprive

Re: [R] caret - prevent resampling when no parameters to find

2011-05-02 Thread Max Kuhn
Yeah, that didn't work. Use fitControl<-trainControl(index = list(seq(along = mdrrClass))) See ?trainControl to understand what this does in detail. Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do rea

Re: [R] caret - prevent resampling when no parameters to find

2011-05-01 Thread Max Kuhn
Not all modeling functions have both the formula and "matrix" interface. For example, glm() and rpart() only have formula method, enet() has only the matrix interface and ksvm() and others have both. This was one reason I created the package (so we don't have to remember all this). train() lets yo

Re: [R] caret - prevent resampling when no parameters to find

2011-05-01 Thread Max Kuhn
No, the sampling is done on rows. The definition of a bootstrap (re)sample is one which is the same size as the original data but taken with replacement. The "Accuracy SD" and "Kappa SD" columns give you a sense of how the model performance varied across these bootstrap data sets (i.e. they are not

Re: [R] Bigining with a Program of SVR

2011-05-01 Thread Max Kuhn
When you say "variable" do you mean predictors or responses? In either case, they do. You can generally tell by reading the help files and looking at the examples. Max On Fri, Apr 29, 2011 at 3:47 PM, ypriverol wrote: > Hi: >  I'm starting a research of Support Vector Regression. I want to obta

Re: [R] caret - prevent resampling when no parameters to find

2011-05-01 Thread Max Kuhn
It isn't building the same model since each fit is created from different data sets. The resampling is sort of the point of the function, but if you really want to avoid it, supply your own index in trainControl that has every index (eg, index = seq(along = mdrrClass)). In this case, the performan

Re: [R] odfWeave Error unzipping file in Win 7

2011-03-21 Thread Max Kuhn
I don't think that this is the issue, but test it on a file without spaces. On Mon, Mar 21, 2011 at 2:25 PM, wrote: > > I have a very similar error that cropped up when I upgraded to R 2.12 and > persists at R 2.12.1. I am running R on Windows XP and OO is at version 3.2. > I did not make any

Re: [R] Specify feature weights in model prediction (CARET)

2011-03-16 Thread Max Kuhn
> Using the 'CARET' package, is it possible to specify weights for features > used in model prediction? For what model? > And for the 'knn' implementation, is there a way > to choose a distance metric (i.e. Mahalanobis distance)? > No, sorry. Max __

Re: [R] use "caret" to rank predictors by random forest model

2011-03-07 Thread Max Kuhn
It would help if you provided the code that you used for the caret functions. The most likely issues is not using importance = TRUE in the call to train() I believe that I've only implemented code for plotting the varImp objects resulting from train() (eg. there is plot.varImp.train but not plot.

[R] Course: R for Predictive Modeling: A Hands-On Introduction

2011-03-04 Thread Max Kuhn
R for Predictive Modeling: A Hands-On Introduction Predictive Analytics World in San Francisco Sunday March 13, 9am to 4:30pm This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically expo

Re: [R] ROC from R-SVM?

2011-02-22 Thread Max Kuhn
The objects functions for kernel methods are unrelated to the area under the ROC curve. However, you can try to choose the cost and kernel parameters to maximize the ROC AUC. See the caret package, specifically the train function. Max On Mon, Feb 21, 2011 at 5:34 PM, Angel Russo wrote: > *Hi, >

Re: [R] Random Forest & Cross Validation

2011-02-20 Thread Max Kuhn
> I am using randomForest package to do some prediction job on GWAS data. I > firstly split the data into training and testing set (70% vs 30%), then > using training set to grow the trees (ntree=10). It looks that the OOB > error in training set is good (<10%). However, it is not very good for

Re: [R] caret::train() and ctree()

2011-02-16 Thread Max Kuhn
Andrew, ctree only tunes over mincriterion and ctree2 tunes over maxdepth (while fixing mincriterion = 0). Seeing both listed as the function is being executed is a bug. I'll setup checks to make sure that the columns specified in tuneGrid are actually the tuning parameters that are used. Max O

Re: [R] Train error:: subscript out of bonds

2011-01-26 Thread Max Kuhn
No. Any valid seed should work. In this case, train() should on;y be using it to determine which training set samples are in the CV or bootstrap data sets. Max On Wed, Jan 26, 2011 at 9:56 AM, Neeti wrote: > > Thank you so much for your reply. In my case it is giving error in some seed > value f

Re: [R] Train error:: subscript out of bonds

2011-01-26 Thread Max Kuhn
Sort of. It lets you define a grid of candidate values to test and to define the rule to choose the best. For some models, it is each to come up with default values that work well (e.g. RBF SVM's, PLS, KNN) while others are more data dependent. In the latter case, the defaults may not work well. M

Re: [R] Train error:: subscript out of bonds

2011-01-25 Thread Max Kuhn
What version of caret and R? We'll also need a reproducible example. On Mon, Jan 24, 2011 at 12:44 PM, Neeti wrote: > > Hi, > I am trying to construct a svmpoly model using the "caret" package (please > see code below). Using the same data, without changing any setting, I am > just changing the

Re: [R] circular reference lines in splom

2011-01-20 Thread Max Kuhn
splom(~dat, groups = grps, lower.panel = panel.circ3, upper.panel = panel.circ3) Thanks, Max On Thu, Jan 20, 2011 at 11:13 AM, Peter Ehlers wrote: > On 2011-01-19 20:15, Max Kuhn wrote: >> >> Hello everyone, >> >> I'm stumped. I'd like to create

[R] circular reference lines in splom

2011-01-19 Thread Max Kuhn
Hello everyone, I'm stumped. I'd like to create a scatterplot matrix with circular reference lines. Here is an example in 2d: library(ellipse) set.seed(1) dat <- matrix(rnorm(300), ncol = 3) colnames(dat) <- c("X1", "X2", "X3") dat <- as.data.frame(dat) grps <- factor(rep(letters[1:4], 25)) pan

[R] less than full rank contrast methods

2010-12-06 Thread Max Kuhn
I'd like to make a less than full rank design using dummy variables for factors. Here is some example data: when <- data.frame(time = c("afternoon", "night", "afternoon", "morning", "morning", "morning", "morning", "afternoon", "afternoon"),

Re: [R] cross validation using e1071:SVM

2010-11-23 Thread Max Kuhn
Neeti, I'm pretty sure that the error is related to the confusionMAtrix call, which is in the caret package, not e1071. The error message is pretty clear: you need to pas in two factor objects that have the same levels. You can check by running the commands: str(pred_true1) str(species_tes

Re: [R] Sporadic errors when training models using CARET

2010-11-23 Thread Max Kuhn
Kendric, I've seen these too and traceback() usually goes back to ksvm(). This doesn't mean that the error is there, but the results fo traceback() from you would be helpful. thanks, Max On Mon, Nov 22, 2010 at 6:18 PM, Kendric Wang wrote: > Hi. I am trying to construct a svmLinear model using

Re: [R] odfWeave - "Format error discovered in the file in sub-document content.xml at 2, 4047 (row, col)"

2010-11-16 Thread Max Kuhn
Can you try it with version 7.16 on R-Forge? Use install.packages("odfWeave", repos="http://R-Forge.R-project.org";) to get it. Thanks, Max On Tue, Nov 16, 2010 at 8:26 AM, Søren Højsgaard wrote: > Dear Mike, > > Good point - thanks. The lines that caused the error mentioned above are > sim

Re: [R] to determine the variable importance in svm

2010-10-26 Thread Max Kuhn
> The caret package has answers to all your questions. >> 1) How to obtain a variable (attribute) importance using >> e1071:SVM (or other >> svm methods)? I haven't implemented a model-specific method for variables importance for SVM models. I know of one package (svmpath) that will return the re

Re: [R] Random Forest AUC

2010-10-22 Thread Max Kuhn
Ravishankar, > I used Random Forest with a couple of data sets I had to predict for binary > response. In all the cases, the AUC of the training set is coming to be 1. > Is this always the case with random forests? Can someone please clarify > this? This is pretty typical for this model. > I hav

Re: [R] Understanding linear contrasts in Anova using R

2010-09-30 Thread Max Kuhn
These two resources might also help: http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf http://cran.r-project.org/web/packages/contrast/vignettes/contrast.pdf Max On Thu, Sep 30, 2010 at 1:33 PM, Ista Zahn wrote: > Hi Professor Howell, > I think the issue here is simply in the assumpt

Re: [R] Creating publication-quality plots for use in Microsoft Word

2010-09-15 Thread Max Kuhn
You might want to check out the Reproducible Research task view: http://cran.r-project.org/web/views/ReproducibleResearch.html There is a section on Microsoft formats, as well as other formats that can be converted. Max On Wed, Sep 15, 2010 at 11:49 AM, Thomas Lumley wrote: > On Wed, 15 S

Re: [R] createDataPartition

2010-09-09 Thread Max Kuhn
Trafim, You'll get more answers if you adhere to the posting guide and tell us you version information and other necessary details. For example, this function is in the caret package (but nobody but me probably knows that =]). The first argument should be a vector of outcome values (not the possi

Re: [R] Reproducible research

2010-09-09 Thread Max Kuhn
A Reproducible Research CRAN task view was recently created: http://cran.r-project.org/web/views/ReproducibleResearch.html I will be updating it with some of the information in this thread. thanks, Max On Thu, Sep 9, 2010 at 11:41 AM, Matt Shotwell wrote: > Well, the attachment was a dud

Re: [R] several odfWeave questions

2010-08-25 Thread Max Kuhn
Ben, >  1a. am I right in believing that odfWeave does not respect the > 'keep.source' option?  Am I missing something obvious? I believe it does, since this gets passed directly to Sweave. >  1b. is there a way to set global options analogous to \SweaveOpts{} > directives in Sweave? (I looked a

Re: [R] odfWeave Issue.

2010-08-11 Thread Max Kuhn
> What does this mean? It's impossible to tell. Read the posting guide and figure out all the details that you left out. If we don't have more information, you should have low expectations about the quality of any replies to might get. -- Max __ R-he

Re: [R] Random Forest - Strata

2010-07-27 Thread Max Kuhn
The index indicates which samples should go into the training set. However, you are using out of bag sampling, so it would use the whole training set and return the OOB error (instead of the error estimates that would be produced by resampling via the index). Which do you want? OOB estimates or ot

Re: [R] UseR! 2010 - my impressions

2010-07-27 Thread Max Kuhn
Not to beat a dead horse... I've found that I like the useR conferences more than most statistics conferences. This isn't due to the difference in content, but the difference in the audience and the environment. For example, everyone is at useR because of their appreciation of R. At most other co

  1   2   3   >