What software are you using, exactly? I'm the maintainer of the
randomForest package, yet I do not know which "manual" you are quoting.
If you are using the randomForest package, the model object can be saved
to a file by save(Rfobject, file="myRFobject.rda"). If you need that to
be in ascii, us
I've been fixing some problems in the combine() function, but that's
only for regression data. Looks like you are doing classification, and
I don't see the problem:
R> library(randomForest)
randomForest 4.5-19
Type rfNews() to see new features/changes/bug fixes.
R> set.seed(1)
R> rflist <- repli
My apologies, subject corrected.
I'm building a RF 50 trees at a time due to memory limitations (I have
roughly .5 million observations and around 20 variables). I thought I
could combine some or all of my forests later and look at global
importance.
If I have say 2 forests : tree1 and tree2
In the words of Simpson (2007), "D'OH!"
I knew it had to be something simple!
On 4/29/07, Gavin Simpson <[EMAIL PROTECTED]> wrote:
>
> On Sat, 2007-04-28 at 21:13 -0400, David L. Van Brunt, Ph.D. wrote:
> > Just out of curiosity, I took the default "iris" example in the RF
> > helpfile...
> > but
On Sat, 2007-04-28 at 21:13 -0400, David L. Van Brunt, Ph.D. wrote:
> Just out of curiosity, I took the default "iris" example in the RF
> helpfile...
> but seeing the admonition against using the formula interface for large data
> sets, I wanted to play around a bit to see how the various options
On Tue, 9 Jan 2007, Bálint Czúcz wrote:
There is an improved version of the original random forest algorithm
available in the "party" package (you can find some additional
information on the details here:
http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ).
I do not know whether i
There is an improved version of the original random forest algorithm
available in the "party" package (you can find some additional
information on the details here:
http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ).
I do not know whether it yields a solution to your problem about
mi
You can try randomForest in Fortran codes, which has that function
doing missing replacement automatically. There are two ways of
imputations (one is fast and the other is time-consuming) to do that.
I did it long time ago.
the link is below. If you have any question, just let me know.
http://www.
Yes I completely agree with your statements. As far as a way around
it, I would say that CART has some facilities for dealing with
missing data. e.g. when an observation is dropped into the tree and
encounters a split at which the variable is missing, then one option
is to simply not send it furthe
I don't know about this module, but a general answer is that if you have
missing data, it may affect your model. If your data is missing at
random, then you might be lucky in your model building.
If however your data was not missing at random (e.g. censoring) , you
might build a wrong predictor.
With remiss, I haven't tried these R tools.
However, I tried a dozen Naive Bayes-like programs, often used to filter
email, where the serious problem with spam has resulted in many
innovations.
The most touted of the worldwide Naive Bayes programs seems to be
CRM114 (not in R, I expect, since its p
When mtry is equal to total number of features, you just get regular bagging
(in the R package -- Breiman & Cutler's Fortran code samples variable with
replacement, so you can't do bagging with that). There are cases when
bagging will do better than random feature selection (i.e., RF), even in
sim
I can't add much to your question, being a complete novice at
classification, but I have tried both randomForest and SVM and I get better
results from randomForest than SVM (even after tuning). randomForest is
also much, much faster. I just thought randomForest was a much better
algorithm, althou
From: Stephen Choularton
>
> Hi
>
> I am trying to use randomForest for classification. I am using this
> code:
>
> > set.seed(71)
> > rf.model <- randomForest(similarity ~ ., data=set1[1:100,],
> importance=TRUE, proximity=TRUE)
> Warning message:
> The response has five or fewer unique valu
If all you need the formula interface for is auto deletion of NAs, I'd
suggest doing something like:
varlist <- c("fruit", "apples", "oranges", "blueberries")
fruits.nona <- na.omit(fruits.data[varlist])
model.rf <- randomForest(fruits.data[-1], fruits.data[[1]], ...)
If you want to know the cal
mmv wrote:
> I'm attempting to pass a string argument into the function
> randomForest but I get an error:
>
> state <- paste(list("fruit ~", "apples+oranges+blueberries",
> "data=fruits.data, mtry=2, do.trace=100, na.action=na.omit,
> keep.forest=TRUE"), sep= " ", collapse="")
I really don't u
> From: Uwe Ligges
>
> [EMAIL PROTECTED] wrote:
>
> > Hello,
> >
> > I'm trying to find out the optimal number of splits (mtry parameter)
> > for a randomForest classification. The classification is binary and
> > there are 32 explanatory variables (mostly factors with each up to 4
> > levels bu
> From: [EMAIL PROTECTED]
>
> Hello,
>
> I'm trying to find out the optimal number of splits (mtry
> parameter) for a randomForest classification. The
> classification is binary and there are 32 explanatory
> variables (mostly factors with each up to 4 levels but also
> some numeric variables
See the tuneRF() function in the package for an implementation of
the strategy recommended by Breiman & Cutler.
BTW, "randomForest" is only for the R package. See Breiman's
web page for notice on trademarks.
Andy
> From: Weiwei Shi
>
> Hi,
> I found the following lines from Leo's randomFore
Hi,
I found the following lines from Leo's randomForest, and I am not sure
if it can be applied here but just tried to help:
mtry0 = the number of variables to split on at each node. Default is
the square root of mdim. ATTENTION! DO NOT USE THE DEFAULT VALUES OF
MTRY0 IF YOU WANT TO OPTIMIZE THE P
[EMAIL PROTECTED] wrote:
> Hello,
>
> I'm trying to find out the optimal number of splits (mtry parameter)
> for a randomForest classification. The classification is binary and
> there are 32 explanatory variables (mostly factors with each up to 4
> levels but also some numeric variables) and 575
Thanks.
Many people pointed that out. (It was due to that I only knew lappy by
that time :).
On 7/11/05, Martin Maechler <[EMAIL PROTECTED]> wrote:
> > "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> > on Thu, 07 Jul 2005 15:44:38 -0400 writes:
>
> Duncan> On 7/7/2005 3:38 PM, W
> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> on Thu, 07 Jul 2005 15:44:38 -0400 writes:
Duncan> On 7/7/2005 3:38 PM, Weiwei Shi wrote:
>> Hi there:
>> I have a question on random foresst:
>>
>> recently i helped a friend with her random forest and i came with
With small sample sizes the variability for estimate of test set error will
be large. Instead of splitting the data once, you should consider
cross-validation or bootstrap for estimating performance.
AFAIK gbm as is won't handle more than two classes. You will need to do
quite a bit of work to g
thanks. but can you suggest some ways for the classification problems
since for some specific class, there are too few observations.
the following is from adding sample.size :
> najie.rf.2 <- randomForest(Diag~., data=one.df[ind==1,4:ncol(one.df)],
> importance=T, sampsize=unlist(sample.size))
>
On 7/7/2005 3:47 PM, Weiwei Shi wrote:
> it works.
> thanks,
>
> but: (just curious)
> why i tried previously and i got
>
>> is.vector(sample.size)
> [1] TRUE
>
> i also tried as.vector(sample.size) and assigned it to sampsz,it still
> does not work.
Sorry, I used "vector" incorrectly. Lists a
> From: Weiwei Shi
>
> it works.
> thanks,
>
> but: (just curious)
> why i tried previously and i got
>
> > is.vector(sample.size)
> [1] TRUE
Because a list is also a vector:
> a <- c(list(1), list(2))
> a
[[1]]
[1] 1
[[2]]
[1] 2
> is.vector(a)
[1] TRUE
> is.numeric(a)
[1] FALSE
Actually, t
it works.
thanks,
but: (just curious)
why i tried previously and i got
> is.vector(sample.size)
[1] TRUE
i also tried as.vector(sample.size) and assigned it to sampsz,it still
does not work.
On 7/7/05, Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> On 7/7/2005 3:38 PM, Weiwei Shi wrote:
> > Hi the
On 7/7/2005 3:38 PM, Weiwei Shi wrote:
> Hi there:
> I have a question on random foresst:
>
> recently i helped a friend with her random forest and i came with this
> problem:
> her dataset has 6 classes and since the sample size is pretty small:
> 264 and the class distr is like this (Diag is th
The limitation comes from the way categorical splits are represented in the
code: For a categorical variable with k categories, the split is
represented by k binary digits: 0=right, 1=left. So it takes k bits to
store each split on k categories. To save storage, this is `packed' into a
4-byte in
> From: luk
>
> When I run randonForest with a 169453x5 matrix, I got the
> following message.
>
> Error in matrix(0, n, n) : matrix: too many elements specified
>
> Can you please advise me how to solve this problem?
>
> Thanks,
>
> Lu
1. When asking new questions, please don't reply to
On Tue, 13 Apr 2004, Hui Han wrote:
> Hi,
>
> I am doing feature selection for my dataset. The following is
> the extreme case where only one feature is left. But I got
> the error below. So my question is that do I have to use
> more than one features?
>
> sample.subset
> udomain.edu hpclass
>
With only one `x' variable, RF will be identical to bagging.
This looks like a bug. I will check it out.
Andy
> From: Hui Han
>
> I agree with you about the less practical meaning of this sample of
> the extreme case. I am just curious about the "grammar" syntax of
> randomForest.
>
> Thank
I agree with you about the less practical meaning of this sample of
the extreme case. I am just curious about the "grammar" syntax of
randomForest.
Thanks.
Hui
On Tue, Apr 13, 2004 at 05:29:06PM +0200, Philippe Grosjean wrote:
> I don't see much why to use random forest with only one predictive
I don't see much why to use random forest with only one predictive variable!
Recall that random forest grow trees with a random subset of variables "in
competition" for growing each node of the trees in the forest... How do you
make such a random subset with only one predictive variable? there is n
> From: Christian Schulz
>
> Hi,
>
> is it correct that i need ~ 2GB RAM that it's
> possible to work with the default setting
> ntree=500 and a data.frame with 100.000 rows
> and max. 10 columns for training and testing?
If you have the test set, and don't need the forest for predicting othe
> Hi,
>
> is it correct that i need ~ 2GB RAM that it's
> possible to work with the default setting
> ntree=500 and a data.frame with 100.000 rows
> and max. 10 columns for training and testing?
>
no. You may parallelize the computations: perform 5 runs of RF with `ntree
= 100' (or less) and sav
simplify the splitting.
You might find the heuristics in the CART book, but I'm not sure.
HTH,
Andy
> -Original Message-
> From: Vladimir N. Kutinsky [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 20, 2003 10:26 AM
> To: Liaw, Andy; [EMAIL PROTECTED]
> Subject: R
Andy,
Does it mean that the error rate does increase as long as the aggregating
number of out-of-bag cases reaches the number of all cases? or, in other
words, because the number of points being predicted (right or wrong) gets
larger at the first steps of the process?
If it so then it's all clea
Andy,
First of all, thank you for you reply.
I'm using R1.6.1 for Windows. A few days ago I updated the randomForest
package from CRAN. It gives warning messages now that the package was built
under R1.6.2 but seems to work fine.
To make sure we're talking about the same thing, let's take iris
cla
Please tell us the version of the package, the version of R, and the
platform you're working in.
Sounds like you should upgrade to a newer version of the randomForest
package. In Breiman's original code, he is counting the number of
misclassified cases and dividing that by the total number of cas
41 matches
Mail list logo