Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Rui Barradas
Hello, If performance is important, and with 73M rows it probably is, take a look at this StackOverflow post. [1] https://stackoverflow.com/a/36058634/8245406 Hope this helps, Rui Barradas Às 21:33 de 08/11/19, Martin Morgan escreveu: With this example df = data.frame(a = c(1, 1, 2, 2),

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Martin Morgan
With this example > df = data.frame(a = c(1, 1, 2, 2), b = c(1, 1, 2, 3), value = 1:4) > df a b value 1 1 1 1 2 1 1 2 3 2 2 3 4 2 3 4 The approach to drop duplicates in the first and second columns has as a consequence the arbitrary choice of 'value' for the duplicate entries -

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Ana Marija
Thank you so much!!! On Fri, Nov 8, 2019 at 11:40 AM Bert Gunter wrote: > > Correction: > df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)]) > df[!duplicated(df[,2:3]), ] ## Note the ! sign > > Bert Gunter > > "The trouble with having an open mind is that people keep coming

[R] About separate train and test data

2019-11-08 Thread javed khan
Hi For instance, we have separate train and test data files (not want to do k fold), so we will not use the function trainControl? In that case if we have to tune the parameters, do we need to specify search =grid in the train function? My second question is how we can measure MCC classification

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Bert Gunter
Correction: df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)]) df[!duplicated(df[,2:3]), ] ## Note the ! sign Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County"

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Boris Steipe
Good. Duplicated returns a boolean index vector that you can use to extract the non-unique rows. B. > On 2019-11-08, at 11:30, Ana Marija wrote: > > I am trying to first identify how many duplicate rows are there determined by > the unique values in the first 3 columns. Now I know that is

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Ana Marija
I am trying to first identify how many duplicate rows are there determined by the unique values in the first 3 columns. Now I know that is about 2 rows which are non unique. But I would like to extract all 8 columns for those non unique rows and see what is going on with META value I have in th

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Bert Gunter
Sorry, but you ask basic questions.You really need to spend some more time with an R tutorial or two. This list is not meant to replace your own learning efforts. You also do not seem to be reading the docs carefully. Under ?unique, it links ?duplicated and tells you that it gives indices of dupli

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Boris Steipe
Are you trying to eliminate duplicated rows from your dataframe? Because that would be better achieved with duplicated(). B. > On 2019-11-08, at 10:32, Ana Marija wrote: > > would you know how would I extract from my original data frame, just > these unique rows? > because this gives me on

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Ana Marija
would you know how would I extract from my original data frame, just these unique rows? because this gives me only those 3 columns, and I want all columns from the original data frame > head(udt) chr pos gene_id 1 chr1 54490 ENSG0227232 2 chr1 58814 ENSG0227232 3 chr1 60351 EN

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Ana Marija
Thank you so much! Converting it to data frame resolved the issue! On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner wrote: > > It seems as if dt is not a (base R) data frame but a > data table. I assume, you will have to transform dt > into a data frame (maybe with as.data.frame) to be > able to app

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Gerrit Eichner
It seems as if dt is not a (base R) data frame but a data table. I assume, you will have to transform dt into a data frame (maybe with as.data.frame) to be able to apply unique in the suggested way. However, I am not familiar with data tables. Perhaps somebody else can provide a more profound gues

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Ana Marija
I tried it but I got this error: > udt <- unique(dt[c("chr", "pos", "gene_id")]) Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) : When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and,

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Gerrit Eichner
Hi, Ana, doesn't udt <- unique(dt[c("chr", "pos", "gene_id")]) nrow(udt) get close to what you want? Hth -- Gerrit - Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eich...@math.uni-giessen.de

[R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Ana Marija
Hello, I have a data frame like this: > head(dt,20) chrpos gene_id pval_nominal pval_ret wl wr 1: chr1 54490 ENSG02272320.6084950 0.7837780 31.62278 21.2838 2: chr1 58814 ENSG02272320.2952110 0.8975820 31.62278 21.2838 3: chr1 60351 ENSG02272

Re: [R] Global curve fitting/shared parameters with nls() alternatives

2019-11-08 Thread Martin Maechler
> James Wagstaff > on Fri, 8 Nov 2019 13:20:41 + writes: > Dear Bert Thanks for getting back to me. Yes that is > exactly the sort of problem I am trying to solve. I am > aware of the option of hard coding the experimental groups > as you suggested, but was hoping

Re: [R] Global curve fitting/shared parameters with nls() alternatives

2019-11-08 Thread James Wagstaff
Dear Bert Thanks for getting back to me. Yes that is exactly the sort of problem I am trying to solve. I am aware of the option of hard coding the experimental groups as you suggested, but was hoping for an easy out of the box approach as I have many groups! Thanks James On Tue, 5 Nov 2019 at 20:2