Re: [R] Random Forest: OOB performance = test set performance?

2021-04-10 Thread Peter Langfelder
I think the only thing you are doing wrong is not setting the random seed (set.seed()) so your results are not reproducible. Depending on the random sample used to select the training and test sets, you get slightly varying accuracy for both, sometimes one is better and sometimes the other. HTH,

[R] Random Forest: OOB performance = test set performance?

2021-04-10 Thread thebudget72
Hi ML, For random forest, I thought that the out-of-bag performance should be the same (or at least very similar) to the performance calculated on a separated test set. But this does not seem to be the case. In the following code, the accuracy computed on out-of-bag sample is 77.81%, while

Re: [R] Stata/Rstudio evil attributes

2021-04-10 Thread William Michels via R-help
Hi Roger, You could look at the attributes() function in base-R. See: > ?attributes >From the help-page: > ## strip an object's attributes: > attributes(x) <- NULL HTH, Bill. W. Michels, Ph.D. On Sat, Apr 10, 2021 at 4:20 AM Koenker, Roger W wrote: > > Wolfgang, > > Thanks, this is

Re: [R] Comparing dates in two large data frames

2021-04-10 Thread Rui Barradas
Hello, The following solution seems to work and is fast, like findInterval is. It first determines where in df2$start is each value of df1$Time. Then uses that index to see if those Times are not greater than the corresponding df$end. I checked against a small subset of df1 and the results

Re: [R] Stata/Rstudio evil attributes

2021-04-10 Thread Koenker, Roger W
Wolfgang, Thanks, this is _extremely_ helpful. Roger > On Apr 10, 2021, at 11:59 AM, Viechtbauer, Wolfgang (SP) > wrote: > > Dear Roger, > > The problem is this. qss() looks like this: > > if (is.matrix(x)) { > [...] > } > if (is.vector(x)) { > [...] > } > qss > > Now let's check

[R] Comparing dates in two large data frames

2021-04-10 Thread Kulupp
Dear all, I have two data frames (df1 and df2) and for each timepoint in df1 I want to know: is it whithin any of the timespans in df2? The result (e.g. "no" or "yes" or 0 and 1) should be shown in a new column of df1 Here is the code to create the two data frames (the size of the two data

Re: [R] Stata/Rstudio evil attributes

2021-04-10 Thread Viechtbauer, Wolfgang (SP)
Dear Roger, The problem is this. qss() looks like this: if (is.matrix(x)) { [...] } if (is.vector(x)) { [...] } qss Now let's check these if() statements: is.vector(B$x) # TRUE is.vector(D$x) # FALSE is.matrix(B$x) # FALSE is.matrix(D$x) # FALSE is.vector(D$x) being FALSE may be

[R] Stata/Rstudio evil attributes

2021-04-10 Thread Koenker, Roger W
As shown in the reproducible example below, I used the RStudio function haven() to read a Stata .dta file, and then tried to do some fitting with the resulting data.frame. This produced an error from my fitting function rqss() in the package quantreg. After a bit of frustrated cursing, I

Re: [R] Assigning several lists to variables whose names are contained in other variables

2021-04-10 Thread Rui Barradas
Hello, I believe that the point we are missing is that datatable$column stores the *names* of the graphs, not the graph objects themselves. So in the loop the objects must be retrieved with mget() or get(). First create a reproducible example. library(tidygraph) my_function <-