Yes #5 is the same thing (sorry, I didn't read them all thoroughly). Your
description of the phrases being 'tags' suggests that you don't need term
positions for matching, and as you noted, you would get unwanted partial
matches. And, the TermQuerys would be much faster.

Peter


On Wed, Oct 24, 2012 at 8:33 PM, Aaron Daubman <daub...@gmail.com> wrote:

> Hi Peter,
>
> Thanks for the recommendation - I believe we are thinking along the
> same lines, but wanted to check to make sure. Are you suggesting
> something different than my #5 (below) or are we essentially
> suggesting the same thing?
>
> On Wed, Oct 24, 2012 at 1:20 PM, Peter Keegan <peterlkee...@gmail.com>
> wrote:
> > Could you index your 'phrase tags' as single tokens? Then your phrase
> > queries become simple TermQuerys.
>
> >>
> >> 5) *This is my current favorite*: stop tokenizing/analyzing these
> >> terms and just use KeywordTokenizer. Most of these phrases are
> >> pre-vetted, and it may be possible to clean/process any others before
> >> creating the docs. My main worry here is that, currently, if I
> >> understand correctly, a document with the phrase "brazilian pop" would
> >> still be returned as a match to a seed document containing only the
> >> phrase "brazilian" (not the other way around, but that is not
> >> necessary), however, with KeywordTokenizer, this would no longer be
> >> the case. If I switched from the current dubious tokenize/stem/etc...
> >> and just used Keyword, would this allow queries like "this used to be
> >> a long phrase query" to match documents that have "this used to be a
> >> long phrase query" as one of the multivalued values in the field
> >> without having to pull term positions? (and thus significantly speed
> >> up performance).
> >>
>
> Thanks again,
>      Aaron
>

Reply via email to