subject:"\[R\] randomForest"

Re: [R] randomForest help

2007-09-06 Thread Liaw, Andy

What software are you using, exactly? I'm the maintainer of the randomForest package, yet I do not know which "manual" you are quoting. If you are using the randomForest package, the model object can be saved to a file by save(Rfobject, file="myRFobject.rda"). If you need that to be in ascii, us

[R] randomForest help

2007-08-24 Thread Jennifer Dawn Watts

Hello! As a new R user, I'm sure this will be a silly question for the rest of you. I've been able to successfully run a forest but yet to figure out proper command lines for the following: 1. saving the forest. The guide just says isavef=1. I'm unsure how expand on this to create the com

Re: [R] randomForest importance problem with combine [Broadcast]

2007-07-24 Thread Liaw, Andy

I've been fixing some problems in the combine() function, but that's only for regression data. Looks like you are doing classification, and I don't see the problem: R> library(randomForest) randomForest 4.5-19 Type rfNews() to see new features/changes/bug fixes. R> set.seed(1) R> rflist <- repli

Re: [R] randomForest importance problem with combine

2007-07-16 Thread Joseph Retzer

My apologies, subject corrected. I'm building a RF 50 trees at a time due to memory limitations (I have roughly .5 million observations and around 20 variables). I thought I could combine some or all of my forests later and look at global importance. If I have say 2 forests : tree1 and tree2

[R] randomForest impotance problem with combine

2007-07-16 Thread Joseph Retzer

I'm building a RF 50 trees at a time due to memory limitations (I have roughly .5 million observations and around 20 variables). I thought I could combine some or all of my forests later and look at global importance. If I have say 2 forests : tree1 and tree2, they have similar Gini and Raw im

Re: [R] randomForest gives different results for formula call v. x, y methods. Why?

2007-04-29 Thread David L. Van Brunt, Ph.D.

In the words of Simpson (2007), "D'OH!" I knew it had to be something simple! On 4/29/07, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > On Sat, 2007-04-28 at 21:13 -0400, David L. Van Brunt, Ph.D. wrote: > > Just out of curiosity, I took the default "iris" example in the RF > > helpfile... > > but

Re: [R] randomForest gives different results for formula call v. x, y methods. Why?

2007-04-29 Thread Gavin Simpson

On Sat, 2007-04-28 at 21:13 -0400, David L. Van Brunt, Ph.D. wrote: > Just out of curiosity, I took the default "iris" example in the RF > helpfile... > but seeing the admonition against using the formula interface for large data > sets, I wanted to play around a bit to see how the various options

[R] randomForest gives different results for formula call v. x, y methods. Why?

2007-04-28 Thread David L. Van Brunt, Ph.D.

Just out of curiosity, I took the default "iris" example in the RF helpfile... but seeing the admonition against using the formula interface for large data sets, I wanted to play around a bit to see how the various options affected the output. Found something interesting I couldn't find documentati

Re: [R] randomForest and missing data

2007-01-10 Thread Torsten Hothorn

On Tue, 9 Jan 2007, Bálint Czúcz wrote: There is an improved version of the original random forest algorithm available in the "party" package (you can find some additional information on the details here: http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ). I do not know whether i

Re: [R] randomForest and missing data

2007-01-09 Thread Bálint Czúcz

There is an improved version of the original random forest algorithm available in the "party" package (you can find some additional information on the details here: http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ). I do not know whether it yields a solution to your problem about mi

Re: [R] randomForest and missing data

2007-01-04 Thread Weiwei Shi

You can try randomForest in Fortran codes, which has that function doing missing replacement automatically. There are two ways of imputations (one is fast and the other is time-consuming) to do that. I did it long time ago. the link is below. If you have any question, just let me know. http://www.

Re: [R] randomForest and missing data

2007-01-04 Thread Darin A. England

> If there is a way around that with randomForest, I'd be interested to > know too. > > Hugues Sicotte > > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Darin A. England > Sent: Thursday, January 04, 2007 3:

Re: [R] randomForest and missing data

2007-01-04 Thread Sicotte, Hugues Ph.D.

Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Darin A. England Sent: Thursday, January 04, 2007 3:13 PM To: r-help@stat.math.ethz.ch Subject: [R] randomForest and missing data Does anyone know a reason why, in principle, a call to randomForest cannot acce

[R] randomForest and missing data

2007-01-04 Thread Darin A. England

Does anyone know a reason why, in principle, a call to randomForest cannot accept a data frame with missing predictor values? If each individual tree is built using CART, then it seems like this should be possible. (I understand that one may impute missing values using rfImpute or some other metho

Re: [R] RandomForest vs. bayes & svm classification performance

2006-07-27 Thread Jameson C. Burt

With remiss, I haven't tried these R tools. However, I tried a dozen Naive Bayes-like programs, often used to filter email, where the serious problem with spam has resulted in many innovations. The most touted of the worldwide Naive Bayes programs seems to be CRM114 (not in R, I expect, since its p

Re: [R] randomForest question [Broadcast]

2006-07-26 Thread Liaw, Andy

When mtry is equal to total number of features, you just get regular bagging (in the R package -- Breiman & Cutler's Fortran code samples variable with replacement, so you can't do bagging with that). There are cases when bagging will do better than random feature selection (i.e., RF), even in sim

[R] randomForest question

2006-07-26 Thread Arne.Muller

Hello, I've a question regarding randomForest (from the package with same name). I've 16 featurs (nominative), 159 positive and 318 negative cases that I'd like to classify (binary classification). Using the tuning from the e1071 package it turns out that the best performance if reached when u

Re: [R] RandomForest vs. bayes & svm classification performance

2006-07-24 Thread roger bos

I can't add much to your question, being a complete novice at classification, but I have tried both randomForest and SVM and I get better results from randomForest than SVM (even after tuning). randomForest is also much, much faster. I just thought randomForest was a much better algorithm, althou

[R] RandomForest vs. bayes & svm classification performance

2006-07-24 Thread Eleni Rapsomaniki

Hi This is a question regarding classification performance using different methods. So far I've tried NaiveBayes (klaR package), svm (e1071) package and randomForest (randomForest). What has puzzled me is that randomForest seems to perform far better (32% classification error) than svm and NaiveBa

Re: [R] randomForest - classifier switch

2006-01-03 Thread Liaw, Andy

From: Stephen Choularton > > Hi > > I am trying to use randomForest for classification. I am using this > code: > > > set.seed(71) > > rf.model <- randomForest(similarity ~ ., data=set1[1:100,], > importance=TRUE, proximity=TRUE) > Warning message: > The response has five or fewer unique valu

[R] randomForest - classifier switch

2006-01-03 Thread Stephen Choularton

Hi I am trying to use randomForest for classification. I am using this code: > set.seed(71) > rf.model <- randomForest(similarity ~ ., data=set1[1:100,], importance=TRUE, proximity=TRUE) Warning message: The response has five or fewer unique values. Are you sure you want to do regression? in:

Re: [R] randomForest Error passing string argument

2005-08-15 Thread Liaw, Andy

If all you need the formula interface for is auto deletion of NAs, I'd suggest doing something like: varlist <- c("fruit", "apples", "oranges", "blueberries") fruits.nona <- na.omit(fruits.data[varlist]) model.rf <- randomForest(fruits.data[-1], fruits.data[[1]], ...) If you want to know the cal

Re: [R] randomForest Error passing string argument

2005-08-15 Thread Uwe Ligges

mmv wrote: > I'm attempting to pass a string argument into the function > randomForest but I get an error: > > state <- paste(list("fruit ~", "apples+oranges+blueberries", > "data=fruits.data, mtry=2, do.trace=100, na.action=na.omit, > keep.forest=TRUE"), sep= " ", collapse="") I really don't u

[R] randomForest Error passing string argument

2005-08-15 Thread mmv

I'm attempting to pass a string argument into the function randomForest but I get an error: state <- paste(list("fruit ~", "apples+oranges+blueberries", "data=fruits.data, mtry=2, do.trace=100, na.action=na.omit, keep.forest=TRUE"), sep= " ", collapse="") model.rf <- randomForest(state) Error in

Re: [R] RandomForest question

2005-07-21 Thread Liaw, Andy

> From: Uwe Ligges > > [EMAIL PROTECTED] wrote: > > > Hello, > > > > I'm trying to find out the optimal number of splits (mtry parameter) > > for a randomForest classification. The classification is binary and > > there are 32 explanatory variables (mostly factors with each up to 4 > > levels bu

Re: [R] RandomForest question

2005-07-21 Thread Liaw, Andy

> From: [EMAIL PROTECTED] > > Hello, > > I'm trying to find out the optimal number of splits (mtry > parameter) for a randomForest classification. The > classification is binary and there are 32 explanatory > variables (mostly factors with each up to 4 levels but also > some numeric variables

Re: [R] RandomForest question

2005-07-21 Thread Liaw, Andy

See the tuneRF() function in the package for an implementation of the strategy recommended by Breiman & Cutler. BTW, "randomForest" is only for the R package. See Breiman's web page for notice on trademarks. Andy > From: Weiwei Shi > > Hi, > I found the following lines from Leo's randomFore

Re: [R] RandomForest question

2005-07-21 Thread Weiwei Shi

Hi, I found the following lines from Leo's randomForest, and I am not sure if it can be applied here but just tried to help: mtry0 = the number of variables to split on at each node. Default is the square root of mdim. ATTENTION! DO NOT USE THE DEFAULT VALUES OF MTRY0 IF YOU WANT TO OPTIMIZE THE P

Re: [R] RandomForest question

2005-07-21 Thread Uwe Ligges

[EMAIL PROTECTED] wrote: > Hello, > > I'm trying to find out the optimal number of splits (mtry parameter) > for a randomForest classification. The classification is binary and > there are 32 explanatory variables (mostly factors with each up to 4 > levels but also some numeric variables) and 575

[R] RandomForest question

2005-07-21 Thread Arne.Muller

Hello, I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases. I've seen that although th

Re: [R] randomForest

2005-07-11 Thread Weiwei Shi

Thanks. Many people pointed that out. (It was due to that I only knew lappy by that time :). On 7/11/05, Martin Maechler <[EMAIL PROTECTED]> wrote: > > "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]> > > on Thu, 07 Jul 2005 15:44:38 -0400 writes: > > Duncan> On 7/7/2005 3:38 PM, W

Re: [R] randomForest

2005-07-10 Thread Martin Maechler

> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]> > on Thu, 07 Jul 2005 15:44:38 -0400 writes: Duncan> On 7/7/2005 3:38 PM, Weiwei Shi wrote: >> Hi there: >> I have a question on random foresst: >> >> recently i helped a friend with her random forest and i came with

Re: [R] randomForest

2005-07-07 Thread Liaw, Andy

With small sample sizes the variability for estimate of test set error will be large. Instead of splitting the data once, you should consider cross-validation or bootstrap for estimating performance. AFAIK gbm as is won't handle more than two classes. You will need to do quite a bit of work to g

Re: [R] randomForest

2005-07-07 Thread Weiwei Shi

thanks. but can you suggest some ways for the classification problems since for some specific class, there are too few observations. the following is from adding sample.size : > najie.rf.2 <- randomForest(Diag~., data=one.df[ind==1,4:ncol(one.df)], > importance=T, sampsize=unlist(sample.size)) >

Re: [R] randomForest

2005-07-07 Thread Duncan Murdoch

On 7/7/2005 3:47 PM, Weiwei Shi wrote: > it works. > thanks, > > but: (just curious) > why i tried previously and i got > >> is.vector(sample.size) > [1] TRUE > > i also tried as.vector(sample.size) and assigned it to sampsz,it still > does not work. Sorry, I used "vector" incorrectly. Lists a

Re: [R] randomForest

2005-07-07 Thread Liaw, Andy

> From: Weiwei Shi > > it works. > thanks, > > but: (just curious) > why i tried previously and i got > > > is.vector(sample.size) > [1] TRUE Because a list is also a vector: > a <- c(list(1), list(2)) > a [[1]] [1] 1 [[2]] [1] 2 > is.vector(a) [1] TRUE > is.numeric(a) [1] FALSE Actually, t

Re: [R] randomForest

2005-07-07 Thread Weiwei Shi

it works. thanks, but: (just curious) why i tried previously and i got > is.vector(sample.size) [1] TRUE i also tried as.vector(sample.size) and assigned it to sampsz,it still does not work. On 7/7/05, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > On 7/7/2005 3:38 PM, Weiwei Shi wrote: > > Hi the

Re: [R] randomForest

2005-07-07 Thread Duncan Murdoch

On 7/7/2005 3:38 PM, Weiwei Shi wrote: > Hi there: > I have a question on random foresst: > > recently i helped a friend with her random forest and i came with this > problem: > her dataset has 6 classes and since the sample size is pretty small: > 264 and the class distr is like this (Diag is th

[R] randomForest

2005-07-07 Thread Weiwei Shi

Hi there: I have a question on random foresst: recently i helped a friend with her random forest and i came with this problem: her dataset has 6 classes and since the sample size is pretty small: 264 and the class distr is like this (Diag is the response variable) sample.size <- lapply(1:6, functi

Re: [R] randomForest error

2005-06-30 Thread Liaw, Andy

The limitation comes from the way categorical splits are represented in the code: For a categorical variable with k categories, the split is represented by k binary digits: 0=right, 1=left. So it takes k bits to store each split on k categories. To save storage, this is `packed' into a 4-byte in

[R] randomForest error

2005-06-30 Thread Arne.Muller

Hello, I'm using the random forest package. One of my factors in the data set contains 41 levels (I can't code this as a numeric value - in terms of linear models this would be a random factor). The randomForest call comes back with an error telling me that the limit is 32 categories. Is there

[R] randomForest partialPlot x.var through function

2005-05-13 Thread Eric Archer

All, I'm trying to set up a function which calls the partialPlot function but am getting an error that I can't seem to solve. Here's a simplified version of the function and error... > pplot <- function(rf,pred.var){partialPlot(x=rf,pred.data=acoust,x.var=pred.var)} > > attach(acoust) > acoust

RE: [R] randomForest: too many element specified?

2005-01-17 Thread Liaw, Andy

> From: luk > > When I run randonForest with a 169453x5 matrix, I got the > following message. > > Error in matrix(0, n, n) : matrix: too many elements specified > > Can you please advise me how to solve this problem? > > Thanks, > > Lu 1. When asking new questions, please don't reply to

[R] randomForest: too many element specified?

2005-01-17 Thread luk

When I run randonForest with a 169453x5 matrix, I got the following message. Error in matrix(0, n, n) : matrix: too many elements specified Can you please advise me how to solve this problem? Thanks, Lu Uwe Ligges <[EMAIL PROTECTED]> wrote: Vera Hofer wrote: > Dear colleagues, > > I have

Re: [R] randomForest: more than one variable needed?

2004-04-13 Thread Torsten Hothorn

On Tue, 13 Apr 2004, Hui Han wrote: > Hi, > > I am doing feature selection for my dataset. The following is > the extreme case where only one feature is left. But I got > the error below. So my question is that do I have to use > more than one features? > > sample.subset > udomain.edu hpclass >

RE: [R] randomForest: more than one variable needed?

2004-04-13 Thread Liaw, Andy

nal Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] Behalf Of Hui Han > > Sent: Tuesday, 13 April, 2004 17:16 > > To: [EMAIL PROTECTED] > > Subject: [R] randomForest: more than one variable needed? > > > > > > Hi, > > &g

Re: [R] randomForest: more than one variable needed?

2004-04-13 Thread Hui Han

gt; > Philippe Grosjean > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Hui Han > Sent: Tuesday, 13 April, 2004 17:16 > To: [EMAIL PROTECTED] > Subject: [R] randomForest: more than one variable needed? > > > Hi, >

RE: [R] randomForest: more than one variable needed?

2004-04-13 Thread Philippe Grosjean

iable? there is no point here! Philippe Grosjean -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Hui Han Sent: Tuesday, 13 April, 2004 17:16 To: [EMAIL PROTECTED] Subject: [R] randomForest: more than one variable needed? Hi, I am doing feature selection

[R] randomForest: more than one variable needed?

2004-04-13 Thread Hui Han

Hi, I am doing feature selection for my dataset. The following is the extreme case where only one feature is left. But I got the error below. So my question is that do I have to use more than one features? sample.subset udomain.edu hpclass 1-1.0 not 2-1.0 not 3-0

RE: [R] RandomForest & memory demand

2003-11-25 Thread Liaw, Andy

> From: Christian Schulz > > Hi, > > is it correct that i need ~ 2GB RAM that it's > possible to work with the default setting > ntree=500 and a data.frame with 100.000 rows > and max. 10 columns for training and testing? If you have the test set, and don't need the forest for predicting othe

Re: [R] RandomForest & memory demand

2003-11-25 Thread Torsten Hothorn

> Hi, > > is it correct that i need ~ 2GB RAM that it's > possible to work with the default setting > ntree=500 and a data.frame with 100.000 rows > and max. 10 columns for training and testing? > no. You may parallelize the computations: perform 5 runs of RF with `ntree = 100' (or less) and sav

[R] RandomForest & memory demand

2003-11-25 Thread Christian Schulz

Hi, is it correct that i need ~ 2GB RAM that it's possible to work with the default setting ntree=500 and a data.frame with 100.000 rows and max. 10 columns for training and testing? P.S. It's possible calculate approximate the memory demand for different settings with RF? Many thanks & regar

RE: [R] RandomForest

2003-08-20 Thread Liaw, Andy

simplify the splitting. You might find the heuristics in the CART book, but I'm not sure. HTH, Andy > -Original Message- > From: Vladimir N. Kutinsky [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 20, 2003 10:26 AM > To: Liaw, Andy; [EMAIL PROTECTED] > Subject: R

RE: [R] RandomForest

2003-08-20 Thread Vladimir N. Kutinsky

Andy, Does it mean that the error rate does increase as long as the aggregating number of out-of-bag cases reaches the number of all cases? or, in other words, because the number of points being predicted (right or wrong) gets larger at the first steps of the process? If it so then it's all clea

RE: [R] RandomForest

2003-08-20 Thread Vladimir N. Kutinsky

Andy, First of all, thank you for you reply. I'm using R1.6.1 for Windows. A few days ago I updated the randomForest package from CRAN. It gives warning messages now that the package was built under R1.6.2 but seems to work fine. To make sure we're talking about the same thing, let's take iris cla

RE: [R] RandomForest

2003-08-20 Thread Liaw, Andy

> -Original Message- > From: Vladimir N. Kutinsky [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 20, 2003 4:43 AM > To: [EMAIL PROTECTED] > Subject: [R] RandomForest > > > Hello, > > When I plot or look at the error rate vector for a random forest > (

[R] RandomForest

2003-08-20 Thread Vladimir N. Kutinsky

Hello, When I plot or look at the error rate vector for a random forest (rf$err.rate) it looks like a descending function except for a few first points of the vector with error rates values lower(sometimes much lower) than the general level of error rates for a forest with such number of trees whe

57 matches

Mail list logo