On 07/23/2008 at 5:09 PM, Steven A Rowe wrote:
> Karl Wettin's recently committed ShingleMatrixAnalyzer
Oops, "ShingleMatrixAnalyzer" -> "ShingleMatrixFilter".
Steve
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional co
Hi Ryan,
Well, at 100 million+ keywords, Lucene might be the right tool.
One thing that you might check out for the query side is Karl Wettin's recently
committed ShingleMatrixAnalyzer (not in any Lucene release yet - only on the
trunk).
The JUnit test class TestShingleMatrixFilter has an exam
Heh, actually I'm using Perl but I've always associated text-search
with Lucene, I'm not sure if it's the best solution or not. On the
small side there are 1.6 million keywords, on the large side there are
well over 100 million but I might find another way to break down the
searches into sm
Hi Ryan,
I'm not sure Lucene's the right tool for this job.
I have used regular expressions and ternary search trees in the past to do
similar things.
Is the set of keywords too large for an in-memory solution like these? If not,
consider using a tool like the Perl package Regex::PreSuf
keywords, and then enumerate the matches. But unless you have a
lot of keywords to index, it probably doesn't make sense to use Lucene for that.
-Original Message-
From: Ryan Detzel [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 23, 2008 3:31 PM
To: java-user@lucene.apache.org
Subject
Everything i've read and seen about luceen is search for keywords in
documents; I want to do the reverse. I have a huge list of
keywords("big boy","red ball","computer") and I have phrases that I
want to see if they keywords are in. For example using the small
keyword list above(store in documents