RE: Using lucene to search a bunch of keywords?

2008-07-23 Thread Steven A Rowe
On 07/23/2008 at 5:09 PM, Steven A Rowe wrote: > Karl Wettin's recently committed ShingleMatrixAnalyzer Oops, "ShingleMatrixAnalyzer" -> "ShingleMatrixFilter". Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional co

RE: Using lucene to search a bunch of keywords?

2008-07-23 Thread Steven A Rowe
Hi Ryan, Well, at 100 million+ keywords, Lucene might be the right tool. One thing that you might check out for the query side is Karl Wettin's recently committed ShingleMatrixAnalyzer (not in any Lucene release yet - only on the trunk). The JUnit test class TestShingleMatrixFilter has an exam

Re: Using lucene to search a bunch of keywords?

2008-07-23 Thread Ryan D
Heh, actually I'm using Perl but I've always associated text-search with Lucene, I'm not sure if it's the best solution or not. On the small side there are 1.6 million keywords, on the large side there are well over 100 million but I might find another way to break down the searches into sm

RE: Using lucene to search a bunch of keywords?

2008-07-23 Thread Steven A Rowe
Hi Ryan, I'm not sure Lucene's the right tool for this job. I have used regular expressions and ternary search trees in the past to do similar things. Is the set of keywords too large for an in-memory solution like these? If not, consider using a tool like the Perl package Regex::PreSuf

RE: Using lucene to search a bunch of keywords?

2008-07-23 Thread Robert Stewart
You need to invert the process. Using Lucene may not be the best option... You need to make your document a key into an index of key words. I've done the same thing, but not with Lucene. You need to pass through the document and for each word (token) lookup in some index (hashtable) to find po