What software are you using, exactly? I'm the maintainer of the
randomForest package, yet I do not know which manual you are quoting.
If you are using the randomForest package, the model object can be saved
to a file by save(Rfobject, file=myRFobject.rda). If you need that to
be in ascii, use
Hello! As a new R user, I'm sure this will be a silly question for the
rest of you. I've been able to successfully run a forest but yet to
figure out proper command lines for the following:
1. saving the forest. The guide just says isavef=1. I'm unsure how
expand on this to create the
I've been fixing some problems in the combine() function, but that's
only for regression data. Looks like you are doing classification, and
I don't see the problem:
R library(randomForest)
randomForest 4.5-19
Type rfNews() to see new features/changes/bug fixes.
R set.seed(1)
R rflist -
I'm building a RF 50 trees at a time due to memory limitations (I have roughly
.5 million observations and around 20 variables). I thought I could combine
some or all of my forests later and look at global importance.
If I have say 2 forests : tree1 and tree2, they have similar Gini and Raw
My apologies, subject corrected.
I'm building a RF 50 trees at a time due to memory limitations (I have
roughly .5 million observations and around 20 variables). I thought I
could combine some or all of my forests later and look at global
importance.
If I have say 2 forests : tree1 and
On Sat, 2007-04-28 at 21:13 -0400, David L. Van Brunt, Ph.D. wrote:
Just out of curiosity, I took the default iris example in the RF
helpfile...
but seeing the admonition against using the formula interface for large data
sets, I wanted to play around a bit to see how the various options
In the words of Simpson (2007), D'OH!
I knew it had to be something simple!
On 4/29/07, Gavin Simpson [EMAIL PROTECTED] wrote:
On Sat, 2007-04-28 at 21:13 -0400, David L. Van Brunt, Ph.D. wrote:
Just out of curiosity, I took the default iris example in the RF
helpfile...
but seeing the
Just out of curiosity, I took the default iris example in the RF
helpfile...
but seeing the admonition against using the formula interface for large data
sets, I wanted to play around a bit to see how the various options affected
the output. Found something interesting I couldn't find
On Tue, 9 Jan 2007, Bálint Czúcz wrote:
There is an improved version of the original random forest algorithm
available in the party package (you can find some additional
information on the details here:
http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ).
I do not know whether it
There is an improved version of the original random forest algorithm
available in the party package (you can find some additional
information on the details here:
http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ).
I do not know whether it yields a solution to your problem about
Does anyone know a reason why, in principle, a call to randomForest
cannot accept a data frame with missing predictor values? If each
individual tree is built using CART, then it seems like this
should be possible. (I understand that one may impute missing values
using rfImpute or some other
-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Darin A. England
Sent: Thursday, January 04, 2007 3:13 PM
To: r-help@stat.math.ethz.ch
Subject: [R] randomForest and missing data
Does anyone know a reason why, in principle, a call to randomForest
cannot accept a data frame
to
know too.
Hugues Sicotte
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Darin A. England
Sent: Thursday, January 04, 2007 3:13 PM
To: r-help@stat.math.ethz.ch
Subject: [R] randomForest and missing data
Does anyone know a reason why
You can try randomForest in Fortran codes, which has that function
doing missing replacement automatically. There are two ways of
imputations (one is fast and the other is time-consuming) to do that.
I did it long time ago.
the link is below. If you have any question, just let me know.
With remiss, I haven't tried these R tools.
However, I tried a dozen Naive Bayes-like programs, often used to filter
email, where the serious problem with spam has resulted in many
innovations.
The most touted of the worldwide Naive Bayes programs seems to be
CRM114 (not in R, I expect, since its
Hello,
I've a question regarding randomForest (from the package with same name). I've
16 featurs (nominative), 159 positive and 318 negative cases that I'd like to
classify (binary classification).
Using the tuning from the e1071 package it turns out that the best performance
if reached when
When mtry is equal to total number of features, you just get regular bagging
(in the R package -- Breiman Cutler's Fortran code samples variable with
replacement, so you can't do bagging with that). There are cases when
bagging will do better than random feature selection (i.e., RF), even in
Hi
This is a question regarding classification performance using different methods.
So far I've tried NaiveBayes (klaR package), svm (e1071) package and
randomForest (randomForest). What has puzzled me is that randomForest seems to
perform far better (32% classification error) than svm and
I can't add much to your question, being a complete novice at
classification, but I have tried both randomForest and SVM and I get better
results from randomForest than SVM (even after tuning). randomForest is
also much, much faster. I just thought randomForest was a much better
algorithm,
Hi
I am trying to use randomForest for classification. I am using this
code:
set.seed(71)
rf.model - randomForest(similarity ~ ., data=set1[1:100,],
importance=TRUE, proximity=TRUE)
Warning message:
The response has five or fewer unique values. Are you sure you want to
do regression? in:
From: Stephen Choularton
Hi
I am trying to use randomForest for classification. I am using this
code:
set.seed(71)
rf.model - randomForest(similarity ~ ., data=set1[1:100,],
importance=TRUE, proximity=TRUE)
Warning message:
The response has five or fewer unique values. Are you
I'm attempting to pass a string argument into the function
randomForest but I get an error:
state - paste(list(fruit ~, apples+oranges+blueberries,
data=fruits.data, mtry=2, do.trace=100, na.action=na.omit,
keep.forest=TRUE), sep= , collapse=)
model.rf - randomForest(state)
Error in if (n==0)
If all you need the formula interface for is auto deletion of NAs, I'd
suggest doing something like:
varlist - c(fruit, apples, oranges, blueberries)
fruits.nona - na.omit(fruits.data[varlist])
model.rf - randomForest(fruits.data[-1], fruits.data[[1]], ...)
If you want to know the call that
Hello,
I'm trying to find out the optimal number of splits (mtry parameter) for a
randomForest classification. The classification is binary and there are 32
explanatory variables (mostly factors with each up to 4 levels but also some
numeric variables) and 575 cases.
I've seen that although
[EMAIL PROTECTED] wrote:
Hello,
I'm trying to find out the optimal number of splits (mtry parameter)
for a randomForest classification. The classification is binary and
there are 32 explanatory variables (mostly factors with each up to 4
levels but also some numeric variables) and 575
Hi,
I found the following lines from Leo's randomForest, and I am not sure
if it can be applied here but just tried to help:
mtry0 = the number of variables to split on at each node. Default is
the square root of mdim. ATTENTION! DO NOT USE THE DEFAULT VALUES OF
MTRY0 IF YOU WANT TO OPTIMIZE THE
See the tuneRF() function in the package for an implementation of
the strategy recommended by Breiman Cutler.
BTW, randomForest is only for the R package. See Breiman's
web page for notice on trademarks.
Andy
From: Weiwei Shi
Hi,
I found the following lines from Leo's randomForest,
From: [EMAIL PROTECTED]
Hello,
I'm trying to find out the optimal number of splits (mtry
parameter) for a randomForest classification. The
classification is binary and there are 32 explanatory
variables (mostly factors with each up to 4 levels but also
some numeric variables) and
From: Uwe Ligges
[EMAIL PROTECTED] wrote:
Hello,
I'm trying to find out the optimal number of splits (mtry parameter)
for a randomForest classification. The classification is binary and
there are 32 explanatory variables (mostly factors with each up to 4
levels but also some
Duncan == Duncan Murdoch [EMAIL PROTECTED]
on Thu, 07 Jul 2005 15:44:38 -0400 writes:
Duncan On 7/7/2005 3:38 PM, Weiwei Shi wrote:
Hi there:
I have a question on random foresst:
recently i helped a friend with her random forest and i came with this
problem:
Thanks.
Many people pointed that out. (It was due to that I only knew lappy by
that time :).
On 7/11/05, Martin Maechler [EMAIL PROTECTED] wrote:
Duncan == Duncan Murdoch [EMAIL PROTECTED]
on Thu, 07 Jul 2005 15:44:38 -0400 writes:
Duncan On 7/7/2005 3:38 PM, Weiwei Shi wrote:
Hi there:
I have a question on random foresst:
recently i helped a friend with her random forest and i came with this problem:
her dataset has 6 classes and since the sample size is pretty small:
264 and the class distr is like this (Diag is the response variable)
sample.size - lapply(1:6,
On 7/7/2005 3:38 PM, Weiwei Shi wrote:
Hi there:
I have a question on random foresst:
recently i helped a friend with her random forest and i came with this
problem:
her dataset has 6 classes and since the sample size is pretty small:
264 and the class distr is like this (Diag is the
it works.
thanks,
but: (just curious)
why i tried previously and i got
is.vector(sample.size)
[1] TRUE
i also tried as.vector(sample.size) and assigned it to sampsz,it still
does not work.
On 7/7/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
On 7/7/2005 3:38 PM, Weiwei Shi wrote:
Hi there:
From: Weiwei Shi
it works.
thanks,
but: (just curious)
why i tried previously and i got
is.vector(sample.size)
[1] TRUE
Because a list is also a vector:
a - c(list(1), list(2))
a
[[1]]
[1] 1
[[2]]
[1] 2
is.vector(a)
[1] TRUE
is.numeric(a)
[1] FALSE
Actually, the way I
On 7/7/2005 3:47 PM, Weiwei Shi wrote:
it works.
thanks,
but: (just curious)
why i tried previously and i got
is.vector(sample.size)
[1] TRUE
i also tried as.vector(sample.size) and assigned it to sampsz,it still
does not work.
Sorry, I used vector incorrectly. Lists are vectors.
thanks. but can you suggest some ways for the classification problems
since for some specific class, there are too few observations.
the following is from adding sample.size :
najie.rf.2 - randomForest(Diag~., data=one.df[ind==1,4:ncol(one.df)],
importance=T, sampsize=unlist(sample.size))
With small sample sizes the variability for estimate of test set error will
be large. Instead of splitting the data once, you should consider
cross-validation or bootstrap for estimating performance.
AFAIK gbm as is won't handle more than two classes. You will need to do
quite a bit of work to
Hello,
I'm using the random forest package. One of my factors in the data set contains
41 levels (I can't code this as a numeric value - in terms of linear models
this would be a random factor). The randomForest call comes back with an error
telling me that the limit is 32 categories.
Is
The limitation comes from the way categorical splits are represented in the
code: For a categorical variable with k categories, the split is
represented by k binary digits: 0=right, 1=left. So it takes k bits to
store each split on k categories. To save storage, this is `packed' into a
4-byte
All,
I'm trying to set up a function which calls the partialPlot function but
am getting an error that I can't seem to solve. Here's a simplified
version of the function and error...
pplot -
function(rf,pred.var){partialPlot(x=rf,pred.data=acoust,x.var=pred.var)}
attach(acoust)
acoust.rf
When I run randonForest with a 169453x5 matrix, I got the following message.
Error in matrix(0, n, n) : matrix: too many elements specified
Can you please advise me how to solve this problem?
Thanks,
Lu
Uwe Ligges [EMAIL PROTECTED] wrote:
Vera Hofer wrote:
Dear colleagues,
I have a
From: luk
When I run randonForest with a 169453x5 matrix, I got the
following message.
Error in matrix(0, n, n) : matrix: too many elements specified
Can you please advise me how to solve this problem?
Thanks,
Lu
1. When asking new questions, please don't reply to other
Hi,
I am doing feature selection for my dataset. The following is
the extreme case where only one feature is left. But I got
the error below. So my question is that do I have to use
more than one features?
sample.subset
udomain.edu hpclass
1-1.0 not
2-1.0 not
3
? there is no
point here!
Philippe Grosjean
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Hui Han
Sent: Tuesday, 13 April, 2004 17:16
To: [EMAIL PROTECTED]
Subject: [R] randomForest: more than one variable needed?
Hi,
I am doing feature selection for my dataset
-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Hui Han
Sent: Tuesday, 13 April, 2004 17:16
To: [EMAIL PROTECTED]
Subject: [R] randomForest: more than one variable needed?
Hi,
I am doing feature selection for my dataset. The following is
the extreme case where only one
: [R] randomForest: more than one variable needed?
Hi,
I am doing feature selection for my dataset. The following is
the extreme case where only one feature is left. But I got
the error below. So my question is that do I have to use
more than one features?
sample.subset
On Tue, 13 Apr 2004, Hui Han wrote:
Hi,
I am doing feature selection for my dataset. The following is
the extreme case where only one feature is left. But I got
the error below. So my question is that do I have to use
more than one features?
sample.subset
udomain.edu hpclass
1
Hi,
is it correct that i need ~ 2GB RAM that it's
possible to work with the default setting
ntree=500 and a data.frame with 100.000 rows
and max. 10 columns for training and testing?
P.S.
It's possible calculate approximate the
memory demand for different settings with RF?
Many thanks
Hi,
is it correct that i need ~ 2GB RAM that it's
possible to work with the default setting
ntree=500 and a data.frame with 100.000 rows
and max. 10 columns for training and testing?
no. You may parallelize the computations: perform 5 runs of RF with `ntree
= 100' (or less) and save the
From: Christian Schulz
Hi,
is it correct that i need ~ 2GB RAM that it's
possible to work with the default setting
ntree=500 and a data.frame with 100.000 rows
and max. 10 columns for training and testing?
If you have the test set, and don't need the forest for predicting other
data,
Hello,
When I plot or look at the error rate vector for a random forest
(rf$err.rate) it looks like a descending function except for a few first
points of the vector with error rates values lower(sometimes much lower)
than the general level of error rates for a forest with such number of trees
: Vladimir N. Kutinsky [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 20, 2003 4:43 AM
To: [EMAIL PROTECTED]
Subject: [R] RandomForest
Hello,
When I plot or look at the error rate vector for a random forest
(rf$err.rate) it looks like a descending function except for
a few first points
Andy,
First of all, thank you for you reply.
I'm using R1.6.1 for Windows. A few days ago I updated the randomForest
package from CRAN. It gives warning messages now that the package was built
under R1.6.2 but seems to work fine.
To make sure we're talking about the same thing, let's take iris
Andy,
Does it mean that the error rate does increase as long as the aggregating
number of out-of-bag cases reaches the number of all cases? or, in other
words, because the number of points being predicted (right or wrong) gets
larger at the first steps of the process?
If it so then it's all
,
Andy
-Original Message-
From: Vladimir N. Kutinsky [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 20, 2003 10:26 AM
To: Liaw, Andy; [EMAIL PROTECTED]
Subject: RE: [R] RandomForest
Andy,
Does it mean that the error rate does increase as long as the
aggregating number of out
56 matches
Mail list logo