RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-25 Thread Steven A Rowe
have been working there all along.) Steve -Original Message- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 3:56 PM To: solr-user@lucene.apache.org Subject: Re: HTMLStripCharFilterFactory not working in Solr4? Thanks for the responses everyone. Steve

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-25 Thread Mike Hugo
- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 3:56 PM To: solr-user@lucene.apache.org Subject: Re: HTMLStripCharFilterFactory not working in Solr4? Thanks for the responses everyone. Steve, the test method you provided also works for me. However

HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Yonik Seeley
You can use LegacyHTMLStripCharFilterFactory to get the previous behavior. See https://issues.apache.org/jira/browse/LUCENE-3690 for more details. -Yonik http://www.lucidimagination.com On Tue, Jan 24, 2012 at 1:34 PM, Mike Hugo m...@piragua.com wrote: We recently updated to the latest build

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
Thanks for the response Yonik, Interestingly enough, changing to to the LegacyHTMLStripCharFilterFactory does NOT solve the problem - in fact I get the same result I can see that the LegacyHTMLStripCharFilterFactory is being applied at startup: Jan 24, 2012 1:25:29 PM

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Steven A Rowe
To: solr-user@lucene.apache.org Subject: HTMLStripCharFilterFactory not working in Solr4? We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Michael Ryan
Try putting the HTMLStripCharFilterFactory before the StandardTokenizerFactory instead of after it. I vaguely recall being burned by something like this before. -Michael

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Yonik Seeley
Oops, I didn't read carefully enough to see that you wanted those constructs entirely stripped out. Given that you're seeing numbers indexed, this strongly indicates an escaping bug in the SolrJ client that must have been introduced at some point. I'll see if I can reproduce it in a unit test.

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
] Sent: Tuesday, January 24, 2012 1:34 PM To: solr-user@lucene.apache.org Subject: HTMLStripCharFilterFactory not working in Solr4? We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way