Hi Martin and everybody,
sorry for the long delay. Thanks for all the suggestions. With my code
and my training data I found similar numbers to the one below.
Thanks
Cheers
Fabien
I did this to generate and search 40 million unique strings
> grams <- as.character(1:4e7)## a long t
On 04/10/2016 03:27 PM, Fabien Tarrade wrote:
Hi Duncan,
Didn't you post the same question yesterday? Perhaps nobody answered
because your question is unanswerable.
sorry, I got a email that my message was waiting for approval and when I
look at the forum I didn't see my message and this is
Hi Jim,
I didn't know this one. I will have a look.
Thanks
Cheers
Fabien
Hi Fabien,
I was going to send this last night, but I thought it was too simple.
Runs in about one millisecond.
df<-data.frame(freq=runif(1000),
strings=apply(matrix(sample(LETTERS,1,TRUE),ncol=10),
1,paste,collap
Hi Fabien,
I was going to send this last night, but I thought it was too simple.
Runs in about one millisecond.
df<-data.frame(freq=runif(1000),
strings=apply(matrix(sample(LETTERS,1,TRUE),ncol=10),
1,paste,collapse=""))
match.ind<-grep("DF",df$strings)
match.ind
[1] 2 11 91 133 169 444
Fabien:
I was unable to make any sense of your latest response (maybe I'm just
dense). If others have similar difficulties, and you fail to get a
satisfactory response, I suggest that you read and follow the posting
guide's request for a **small, reproducible example* (perhaps the
first few dozen
Hi Duncan,
Didn't you post the same question yesterday? Perhaps nobody answered
because your question is unanswerable.
sorry, I got a email that my message was waiting for approval and when I
look at the forum I didn't see my message and this is why I sent it
again and this time I did check t
On 10/04/2016 2:03 PM, Fabien Tarrade wrote:
Hi there,
I have a data frame DF with 40 millions strings and their frequency. I
am searching for strings with a given pattern and I am trying to speed
up this part of my code. I try many options but so far I am not
satisfied. I tried:
- grepl and sub
Hi there,
I have a data frame DF with 40 millions strings and their frequency. I
am searching for strings with a given pattern and I am trying to speed
up this part of my code. I try many options but so far I am not
satisfied. I tried:
- grepl and subset are equivalent in term of processing t
8 matches
Mail list logo