Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-09-16 Thread Shawn Heisey
On 9/16/2015 5:42 AM, Alessandro Benedetti wrote: > Any update on this ? I found two workarounds, and went with the second one -- removing the PatternReplaceFilterFactory from fieldType definitions that also include WDF. They are both documented in the issue: https://issues.apache.org/jira/brows

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-09-16 Thread Alessandro Benedetti
Any update on this ? Cheers 2015-08-21 0:22 GMT+01:00 Shawn Heisey : > On 7/8/2015 6:13 PM, Yonik Seeley wrote: > > On Wed, Jul 8, 2015 at 6:50 PM, Shawn Heisey > wrote: > >> After the fix (with luceneMatchVersion at 4.9), both "aaa" and "bbb" end > >> up at position 2. > > Yikes, that's defini

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-08-20 Thread Shawn Heisey
On 7/8/2015 6:13 PM, Yonik Seeley wrote: > On Wed, Jul 8, 2015 at 6:50 PM, Shawn Heisey wrote: >> After the fix (with luceneMatchVersion at 4.9), both "aaa" and "bbb" end >> up at position 2. > Yikes, that's definitely wrong. I have filed LUCENE-6889 for this problem. I'd like to write a unit te

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-14 Thread Shawn Heisey
On 7/14/2015 11:42 AM, Shawn Heisey wrote: > So the problem might be with the rulefile, or with some strange > combination of these analysis components. I did not build this > rulefile myself. It was built by another, eitherRobert Muir or Steve > Rowe if I remember right, when SOLR-4123 was underwa

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-14 Thread Shawn Heisey
On 7/14/2015 10:46 AM, Alessandro Benedetti wrote: > Furthermore I was checking with Solr 5.1 to find the WDFilter factory > actually to work in a proper way. > Is it possible to know what was the conclusion for this issue ? > Is there an issue in the WordDelimiter token filter in the current Solr

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-14 Thread Alessandro Benedetti
Furthermore I was checking with Solr 5.1 to find the WDFilter factory actually to work in a proper way. Is it possible to know what was the conclusion for this issue ? Is there an issue in the WordDelimiter token filter in the current Solr version? Has it been fixed ? Any update ? Cheers 2015-07-

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-14 Thread Alessandro Benedetti
Just found this interesting article of Mike, that actually explains the sausagization problem, which actually is related to the strange positions in some case. http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html Cheers 2015-07-09 1:13 GMT+01:00 Yonik Seeley : > On Wed,

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Yonik Seeley
On Wed, Jul 8, 2015 at 6:50 PM, Shawn Heisey wrote: > After the fix (with luceneMatchVersion at 4.9), both "aaa" and "bbb" end > up at position 2. Yikes, that's definitely wrong. -Yonik

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Shawn Heisey
On 7/8/2015 4:01 PM, Jack Krupansky wrote: > In Lucene 4.8, LUCENE-5111: Fix WordDelimiterFilter offsets > > https://issues.apache.org/jira/browse/LUCENE-5111 > > Make sure the documents are queried and indexed with the same Lucene match > version. Since I have updated the luceneMatchVersion on th

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Jack Krupansky
In Lucene 4.8, LUCENE-5111: Fix WordDelimiterFilter offsets https://issues.apache.org/jira/browse/LUCENE-5111 Make sure the documents are queried and indexed with the same Lucene match version. -- Jack Krupansky On Wed, Jul 8, 2015 at 5:19 PM, Shawn Heisey wrote: > On 7/8/2015 2:19 PM, Shawn

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Alessandro Benedetti
Yes Shawn, I was raising the fact that I see strange values in the positions as well. You said you fixed going back with an old version ? This should not be ok, I mean, I assume the latest version should be the best… Any idea or clarification guys ? 2015-07-08 21:10 GMT+01:00 Shawn Heisey : > On

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Shawn Heisey
On 7/8/2015 2:19 PM, Shawn Heisey wrote: > It appears that changing luceneMatchVersion from LUCENE_4_9 to LUCENE_47 > has fixed this problem ... so I think somebody must have "fixed" WDF to > its current behavior, but put in a version check for the old behavior. The luceneMatchVersion change has f

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Shawn Heisey
On 7/8/2015 2:10 PM, Shawn Heisey wrote: > At this point I think I should probably file a bug in Jira ... anyone > have any thoughts on that? It appears that changing luceneMatchVersion from LUCENE_4_9 to LUCENE_47 has fixed this problem ... so I think somebody must have "fixed" WDF to its current

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Shawn Heisey
On 7/8/2015 9:26 AM, Alessandro Benedetti wrote: > Taking a look into the documentation I see this inconsistent orderings in > my opinion : Alessandro, thank you for your reply. I couldn't really tell what you were saying. I *think* you were agreeing with me that the current behavior seems like

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Alessandro Benedetti
Taking a look into the documentation I see this inconsistent orderings in my opinion : *Example:* Concatenate word parts and number parts, but not word and number parts that occur in the same token. *In:* "hot-spot 100+42 XL40" *Tokenizer to Filter:* "hot-spot"(1), "100+42"(2), "XL40"(3

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Shawn Heisey
On 7/8/2015 8:44 AM, Shawn Heisey wrote: > This is what 4.9.1 does with it: > > 1 rrr-coleccion > 2 rrr > 2 coleccion > 2 rrrcoleccion > 3 coleccion > 4 gracita > 5 morales > 6 foobar Followup: This is what Solr 5.2.1 does for query analysis, which also seems wrong, and doesn't match the phrase q

Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Shawn Heisey
I'm not sure if this is a bug, but it does break searches that work fine in 4.7.2if we put the same config and index on 4.9.1. Here's a slightly redacted bit of text that's been sent to the index, and is also used as a phrase query: RRR-COLECCION: COLECCIÓN: Gracita Morales foobar Here are the f