Re: merge problems

2016-10-11 Thread Michael McCandless
OK I have a small test case showing the issue! I opened https://issues.apache.org/jira/browse/LUCENE-7491 Thanks for reporting this, Hans. Mike McCandless http://blog.mikemccandless.com On Tue, Oct 11, 2016 at 12:08 PM, Hans Lund wrote: > hmm you're right - when it revealed a bug in our index

Re: PhraseQuery

2016-10-11 Thread lukes
Thanks Mike. I discovered that earlier. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/PhraseQuery-tp4299871p4300752.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: How to add ASCIIFoldingFilter in ClassicAnalyzer

2016-10-11 Thread Kumaran Ramasubramanian
@Ahmet, Uwe: Thanks a lot for your suggestion. Already i have written custom analyzer as you said. But just trying to avoid new component in my search flow. @Adrien: how to add filter using AnalyzerWrapper. Any pointers? On Tue, Oct 11, 2016 at 8:16 PM, Uwe Schindler wrote: > I'd sugges

Re: merge problems

2016-10-11 Thread Hans Lund
hmm you're right - when it revealed a bug in our indexing code I stopped wondering ;-) but now I tried to create small tests to show the behavior - until now without success. I'm pretty sure that I can reproduce it by re-introducing our index bug, unfortunately it occurs after some hours parsing an

RE: How to add ASCIIFoldingFilter in ClassicAnalyzer

2016-10-11 Thread Uwe Schindler
I'd suggest to use CustomAnalyzer for defining your own analyzer. This allows to build your own analyzer with the components (tokenizers and filters) you like to have. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Me

Re: How to add ASCIIFoldingFilter in ClassicAnalyzer

2016-10-11 Thread Ahmet Arslan
Hi, I forgot to include : .addTokenFilter("asciifolding") Ahmet On Tuesday, October 11, 2016 5:37 PM, Ahmet Arslan wrote: Hi Kumaran, Writing a custom analyzer is easier than it seems. Please see how I added kstem to classic analyzer: return CustomAnalyzer.builder() .withTokenizer("classic"

Re: How to add ASCIIFoldingFilter in ClassicAnalyzer

2016-10-11 Thread Ahmet Arslan
Hi Kumaran, Writing a custom analyzer is easier than it seems. Please see how I added kstem to classic analyzer: return CustomAnalyzer.builder() .withTokenizer("classic") .addTokenFilter("classic") .addTokenFilter("lowercase") .addTokenFilter("kstem") .build(); Ahmet On Tuesday, October 11,

Re: How to add ASCIIFoldingFilter in ClassicAnalyzer

2016-10-11 Thread Adrien Grand
Hi Kumaran, If it is fine to add the ascii folding filter at the end of the analysis chain, then you could use AnalyzerWrapper. Otherwise, you need to create a new analyzer that has the same analysis chain as ClassicAnalyzer, plus an ASCIIFoldingFilter. Le mar. 11 oct. 2016 à 16:22, Kumaran Ramas

How to add ASCIIFoldingFilter in ClassicAnalyzer

2016-10-11 Thread Kumaran Ramasubramanian
Hi All, Is there any way to add ASCIIFoldingFilter over ClassicAnalyzer without writing a new custom analyzer ? should i extend StopwordAnalyzerBase again? I know that ClassicAnalyzer is final. any special purpose for making it as final? Because, StandardAnalyzer was not final before ? public

Clarification Regarding Directory & Merging

2016-10-11 Thread aravinth thangasami
Hi all, Does Directories (SimpleFSDirectory, NIOFSDirectory, MMapDirectory) have any performance impact while indexing ? If Directory improves reading based on platforms, will it have any impact on merging ? Thanks Aravinth

Re: merge problems

2016-10-11 Thread Michael McCandless
Hmm, that should be "OK" from Lucene's standpoint. I mean, it should not result in strange merge exceptions later on. I think there's a bug somewhere in Lucene's efforts to pretend it's fully schema-less ... I'll try to reproduce this. Mike McCandless http://blog.mikemccandless.com On Tue, Oct

Re: merge problems

2016-10-11 Thread Hans Lund
Turned out to be must much simpler - we had added a new 'dynamic' field to a stats doc a count on articles based on identified language code. Having a set of test documents in German, English, Swedish - no one had suspected the obvious that the language detection categorized a single document as be