Re: When does QueryParser creates PhraseQueries

2008-02-29 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks a lot for your help Daniel, I have found a solution :) The 'token' field is public inside QueryParser, and inside 'token.image' you can read the origin String with apostrophe. Thus, I can differ between the two situations - and simply return

Re: When does QueryParser creates PhraseQueries

2008-02-26 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel, thank you very much for the hint! I stepped through the code and tried some scenarios. when I type in with whitespace delimiters ~ termA termB this will result into two invocations of getFieldQuery, one for each term. when I type ~

Re: When does QueryParser creates PhraseQueries

2008-02-26 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 So, I stepped throw the QueryParser code further, and I now have found the source for this behaviour: the QueryParserTokenManager ~System.out.println(This one returns the whole String:); ~String strQuery = home/reuschling; ~

When does QueryParser creates PhraseQueries

2008-02-25 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I have the behaviour that when I search with Luke (version 0.7.1, Lucene version 2.2.0) inside an arbritray field, the QueryParser creates a PhraseQuery when I type in ~ termA/termB (no ...) When I read the documentation at

Re: getting term offset information for fields with multiple value entiries

2007-08-20 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 this is the 2.2.0 release Grant Ingersoll schrieb: What version of Lucene are you using? On Aug 17, 2007, at 12:44 PM, [EMAIL PROTECTED] wrote: Hello community, dear Grant I have build a JUnit test case that illustrates the problem -

Re: getting term offset information for fields with multiple value entiries

2007-08-20 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello Grant, dear community I have written some lines of code to adapt the offset values from Lucene to values where the terms really appear in the concatenated field value entries. My tests are successful :) There are two additional methods inside

Re: getting term offset information for fields with multiple value entiries

2007-08-17 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello community, dear Grant I have build a JUnit test case that illustrates the problem - there, I try to cut out the right substring with the offset values given from Lucene - and fail :( A few remarks: In this example, the 'é' from 'Bosé' makes

getting term offset information for fields with multiple value entiries

2007-08-16 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I have an index with an 'actor' field, for each actor there exists an single field value entry, e.g. stored/compressed,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition movie_actors movie_actors:Mayrata O'Wisiedo (as

result explanations / how to get the current document id inside a similarity subclass

2006-11-10 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello folks, we want to work with explanations of document scores inside result lists. In this context we are interested on the scores of the single terms from a query, for each document inside the result list: Query: termA termB Result: doc1 =

Re: Quotes dependent StopWords removal

2006-08-16 Thread duiduder
Hello Sameer, what about this: - during indexing, use the StandardAnalyzer without stopwords - during the search, use 2 different Analyzers - one with and one without stopwords. Thereyby, you look first whether the user has typed in quotes inside her query String. # If so, look whether

Re: How does the lucene normalize the score?

2006-01-27 Thread duiduder
..but this means, that the scores are not comparable across queries, because a hit with the score '0.7' from one query mustn't be as 'good' as a '0.7' from another query...and this is only the case, whether the original, unnormalized top score value was less than 1.0. Looks this really like a

lucene similarity value range

2005-12-13 Thread duiduder
Hi, I am wondering whether the range of the similarity values is guaranteed to be inside a well-defined range (e.g. between [0..1]). I use the DefaultSimilarity implementation from the SVN Lucene version and actually recieve values of e.g. 1.84. Is this a bug? Is there any range guaranteed?