Extracting terms from a query splitting a phrase.

2008-02-05 Thread Spencer Tickner
Hi List, Thanks in advance for the help. I'm trying to extract terms from a query. From the reading I've done a phrase such as "General Act" is considered a term. http://lucene.apache.org/java/docs/queryparsersyntax.html#Terms . However when I'm doing testing to get the extractTerms of my query it

Re: Extracting terms from a query splitting a phrase.

2008-02-05 Thread Erick Erickson
I don't think WhitespaceAnalyzer is doing what you think it is. From the Javadoc... public class *WhitespaceTokenizer*extends CharTokenizer A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.

Re: Extracting terms from a query splitting a phrase.

2008-02-05 Thread Spencer Tickner
Hi Erick, Thanks for your response. I think you're right about the Whitespace anlayzer. I was actually useing the StandardAnalyzer before and tried the Whitespace analyzer to see if the StandardAnalyzer was pulling off the quotes. I guess what I'm trying to mimic is the information found: http://

Re: Extracting terms from a query splitting a phrase.

2008-02-05 Thread Spencer Tickner
I guess to be move concise I'm looking to get all the terms that were searched for so I can highlight them in the original document. After looking through the highlighter contrib class I figure I had found my solution with query.extractTerms. Works great for searches like: genera* -> generally, ge

Re: Extracting terms from a query splitting a phrase.

2008-02-10 Thread Doron Cohen
PhraseQuery.extractTerms() returns the terms making up the phrase, and so it is not adequate for 'finding' a single term that represents the phrase query, one that represents the searched entire text. It seems you are trying to obtain a string that can be matched against the displayed text for e.g