On Mon, Aug 30, 2010 at 3:30 PM, H. Wilson <[email protected]> wrote: > Ard, > > You are absolutely right.. and this didn't make sense to me either. I think > I was too worn out from my week and too excited to have code that "worked" > to notice the obvious... this must be a workaround. However, I will need a > little guidance on how to inspect the tokens. I have Luke, but never really > understood how to use it properly. Could you give me a clear list of steps, > or point me to a resource I missed, on how I would go about inspecting > tokens during insert/search? Thanks.
I'd just print them to your console with Token#term() or use a debugger . If you do that during indexing and searching, I think you must see some difference in the token that explains *why* Lucene doesn't find a hit for your usecase with spaces. Luke is hard to use for the multi-index jackrabbit indexing, as well as the field value prefixing: It is unfortunate and not completely necessary any more but has some historical reasons from Lucene back in the days when it could not handle very many unique fieldnames Regards Ard > > H. Wilson > > On 08/30/2010 03:30 AM, Ard Schrijvers wrote: >> >> Hello, >> >> On Fri, Aug 27, 2010 at 9:06 PM, H. Wilson<[email protected]> wrote: >>> >>> OK, well I got the spaces part figured out, and will post it for anyone >>> who >>> needs it. Putting quotes around the spaces unfortunately did not work. >>> During testing, I determined that if you performed the following query >>> for >>> the exact fullName property: >>> >>> filter.addContains ( @fullName, >>> '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Land")); >>> >>> It would return nothing. But tweak it a little and add a wildcard, and it >>> would return results: >>> >>> filter.addContains ( @fullName, >>> '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Lan*")); >> >> This does not make sense...see below >> >>> But since I did not want to throw in wild cards where they might not be >>> wanted, if a search string contained spaces, did not contain wild cards >>> and >>> the user was not concerned with case sensitivity, I used the >>> fn:lower-case. >>> So I ended up with the following excerpt (our clients wanted options for >>> case sensitive and case insensitive searching) . >>> >>> public OurParameter[] getOurParameters (boolean >>> performCaseSensitiveSearch, >>> String searchTerm, String srchField ) { //srchField in this case was >>> fullName >>> >>> ..... >>> >>> if ( performCaseSensitiveSearch) { >>> >>> //jcr:like for case sensitive >>> filter.orJCRExpression ("jcr:like(@" + srchField +", >>> '"+Text.escapeIllegalXpathSearchChars (searchTerm)+"')"); >>> >>> } >>> else { >>> >>> //only use fn:lower-case if there is spaces, with NO wild cards >>> >>> if ( searchTerm.contains (" ")&& !searchTerm.contains ("*")&& >>> !searchTerm.contains ("?") ) { >>> >>> filter.addJCRExpression ("fn:lower-case(@"+srchField+") = >>> '"+Text.escapeIllegalXpathSearchChars(searchTerm.toLowerCase())+"'"); >>> >>> } >>> >>> else { >>> >>> //jcr:contains for case insensitive >>> filter.addContains ( srchField, >>> Text.escapeIllegalXpathSearchChars(searchTerm)); >>> >>> } >>> >>> } >> >> This seems to me a workaround around the real problem, because, it >> just doesn't make sense to me. Can you inspect the tokens that are >> created by your analyser. Make sure you inspect the tokens during >> indexing (just store something) and during searching: just search in >> the property. I am quite sure you'll see the issue then. Perhaps >> something with Text.escapeIllegalXpathSearchChars though it seems that >> it should leave spaces untouched >> >> Regards Ard >> >> >>> .... >>> >>> } >>> >>> >>> Hope that helps anyone who needs it. >>> >>> H. Wilson >>> >>>>> OK so it looks like I have one other issue. Using the configuration as >>>>> posted below and sticking to my previous examples, with the addition of >>>>> one >>>>> with whitespace. With the following three in our repository: >>>>> >>>>> .North.South.East.WestLand >>>>> .North.South.East.West_Land >>>>> .North.South.East.West Land //yes that's a space >>>>> >>>>> ...using a jcr:contains, with exact name search with NO wild cards: the >>>>> first two return properly, but the last one yields no result. >>>>> >>>>> filter.addContains(@fullName, >>>>> >>>>> >>>>> '"+org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(".North.South.East.West >>>>> Land") +"')); >>>> >>>> I think the space in a contains is seen as an AND by the >>>> Jackrabbit/Lucene QueryParser. I should test this however as I am not >>>> sure. Perhaps you can put quotes around it, not sure if that works >>>> though >>>> >>>> Regards Ard >>>> >>>>> According to the Lucene documentation, KeywordAnalyzer should be >>>>> creating >>>>> one token, plus combined with escaping the Illegal Characters (i.e. >>>>> spaces), >>>>> shouldn't this search work? Thanks again. >>>>> >>>>> H. Wilson >
