Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-29 Thread Fabien Tarrade
Hi Martin and everybody, sorry for the long delay. Thanks for all the suggestions. With my code and my training data I found similar numbers to the one below. Thanks Cheers Fabien I did this to generate and search 40 million unique strings > grams <- as.character(1:4e7)## a long t

Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Martin Morgan
On 04/10/2016 03:27 PM, Fabien Tarrade wrote: Hi Duncan, Didn't you post the same question yesterday? Perhaps nobody answered because your question is unanswerable. sorry, I got a email that my message was waiting for approval and when I look at the forum I didn't see my message and this is

Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Fabien Tarrade
Hi Jim, I didn't know this one. I will have a look. Thanks Cheers Fabien Hi Fabien, I was going to send this last night, but I thought it was too simple. Runs in about one millisecond. df<-data.frame(freq=runif(1000), strings=apply(matrix(sample(LETTERS,1,TRUE),ncol=10), 1,paste,collap

Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Jim Lemon
Hi Fabien, I was going to send this last night, but I thought it was too simple. Runs in about one millisecond. df<-data.frame(freq=runif(1000), strings=apply(matrix(sample(LETTERS,1,TRUE),ncol=10), 1,paste,collapse="")) match.ind<-grep("DF",df$strings) match.ind [1] 2 11 91 133 169 444

Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Bert Gunter
Fabien: I was unable to make any sense of your latest response (maybe I'm just dense). If others have similar difficulties, and you fail to get a satisfactory response, I suggest that you read and follow the posting guide's request for a **small, reproducible example* (perhaps the first few dozen

Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Fabien Tarrade
Hi Duncan, Didn't you post the same question yesterday? Perhaps nobody answered because your question is unanswerable. sorry, I got a email that my message was waiting for approval and when I look at the forum I didn't see my message and this is why I sent it again and this time I did check t

Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Duncan Murdoch
On 10/04/2016 2:03 PM, Fabien Tarrade wrote: Hi there, I have a data frame DF with 40 millions strings and their frequency. I am searching for strings with a given pattern and I am trying to speed up this part of my code. I try many options but so far I am not satisfied. I tried: - grepl and sub

[R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Fabien Tarrade
Hi there, I have a data frame DF with 40 millions strings and their frequency. I am searching for strings with a given pattern and I am trying to speed up this part of my code. I try many options but so far I am not satisfied. I tried: - grepl and subset are equivalent in term of processing t