Hello,
If performance is important, and with 73M rows it probably is, take a
look at this StackOverflow post.
[1] https://stackoverflow.com/a/36058634/8245406
Hope this helps,
Rui Barradas
Às 21:33 de 08/11/19, Martin Morgan escreveu:
With this example
df = data.frame(a = c(1, 1, 2, 2),
With this example
> df = data.frame(a = c(1, 1, 2, 2), b = c(1, 1, 2, 3), value = 1:4)
> df
a b value
1 1 1 1
2 1 1 2
3 2 2 3
4 2 3 4
The approach to drop duplicates in the first and second columns has as a
consequence the arbitrary choice of 'value' for the duplicate entries -
Thank you so much!!!
On Fri, Nov 8, 2019 at 11:40 AM Bert Gunter wrote:
>
> Correction:
> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
> df[!duplicated(df[,2:3]), ] ## Note the ! sign
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming
Hi
For instance, we have separate train and test data files (not want to do k
fold), so we will not use the function trainControl? In that case if we
have to tune the parameters, do we need to specify search =grid in the
train function?
My second question is how we can measure MCC classification
Correction:
df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
df[!duplicated(df[,2:3]), ] ## Note the ! sign
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County"
Good. Duplicated returns a boolean index vector that you can use to extract the
non-unique rows.
B.
> On 2019-11-08, at 11:30, Ana Marija wrote:
>
> I am trying to first identify how many duplicate rows are there determined by
> the unique values in the first 3 columns. Now I know that is
I am trying to first identify how many duplicate rows are there determined
by the unique values in the first 3 columns. Now I know that is about 2
rows which are non unique. But I would like to extract all 8 columns for
those non unique rows and see what is going on with META value I have in
th
Sorry, but you ask basic questions.You really need to spend some more time
with an R tutorial or two. This list is not meant to replace your own
learning efforts.
You also do not seem to be reading the docs carefully. Under ?unique, it
links ?duplicated and tells you that it gives indices of dupli
Are you trying to eliminate duplicated rows from your dataframe? Because that
would be better achieved with duplicated().
B.
> On 2019-11-08, at 10:32, Ana Marija wrote:
>
> would you know how would I extract from my original data frame, just
> these unique rows?
> because this gives me on
would you know how would I extract from my original data frame, just
these unique rows?
because this gives me only those 3 columns, and I want all columns
from the original data frame
> head(udt)
chr pos gene_id
1 chr1 54490 ENSG0227232
2 chr1 58814 ENSG0227232
3 chr1 60351 EN
Thank you so much! Converting it to data frame resolved the issue!
On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
wrote:
>
> It seems as if dt is not a (base R) data frame but a
> data table. I assume, you will have to transform dt
> into a data frame (maybe with as.data.frame) to be
> able to app
It seems as if dt is not a (base R) data frame but a
data table. I assume, you will have to transform dt
into a data frame (maybe with as.data.frame) to be
able to apply unique in the suggested way. However,
I am not familiar with data tables. Perhaps somebody
else can provide a more profound gues
I tried it but I got this error:
> udt <- unique(dt[c("chr", "pos", "gene_id")])
Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
When i is a data.table (or character vector), the columns to join by
must be specified using 'on=' argument (see ?data.table), by keying x
(i.e. sorted, and,
Hi, Ana,
doesn't
udt <- unique(dt[c("chr", "pos", "gene_id")])
nrow(udt)
get close to what you want?
Hth -- Gerrit
-
Dr. Gerrit Eichner Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de
Hello,
I have a data frame like this:
> head(dt,20)
chrpos gene_id pval_nominal pval_ret wl wr
1: chr1 54490 ENSG02272320.6084950 0.7837780 31.62278 21.2838
2: chr1 58814 ENSG02272320.2952110 0.8975820 31.62278 21.2838
3: chr1 60351 ENSG02272
> James Wagstaff
> on Fri, 8 Nov 2019 13:20:41 + writes:
> Dear Bert Thanks for getting back to me. Yes that is
> exactly the sort of problem I am trying to solve. I am
> aware of the option of hard coding the experimental groups
> as you suggested, but was hoping
Dear Bert
Thanks for getting back to me. Yes that is exactly the sort of problem I am
trying to solve. I am aware of the option of hard coding the experimental
groups as you suggested, but was hoping for an easy out of the box approach
as I have many groups!
Thanks
James
On Tue, 5 Nov 2019 at 20:2
17 matches
Mail list logo