RE: Can you escape characters you don't want the analyzer to modify

2013-09-18 Thread Портнов Дмитрий
I guess, you have to provide customized tokenizer in your analyzer. -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Wednesday, September 18, 2013 12:26 AM To: java-user@lucene.apache.org Subject: Can you escape characters you don't want the analyzer to

Re: A question about seek past EOF: MMapIndexInput

2013-09-18 Thread Adrien Grand
Hi, This means that there is either a bug in Lucene or that your index is corrupted. Can you reproduce this failure if you reindex data? The output of CheckIndex would be interesting as well, see

IndexFileNameFilter

2013-09-18 Thread Yonghui Zhao
In lucene 4.3.0 there is no IndexFileNameFilter. And I find in org.apache.lucene.index.IndexFileNames the index file extensions have only 3 types. public static final String INDEX_EXTENSIONS[] = new String[] { COMPOUND_FILE_EXTENSION, COMPOUND_FILE_ENTRIES_EXTENSION, GEN_EXTENSION,

Re: Position problems in 4.3.0

2013-09-18 Thread Adrien Grand
Hi, This looks bad! Can you write a small test case that reproduces the issue so that we can try to understand what happens here? Thanks! -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For

Re: IndexFileNameFilter

2013-09-18 Thread Adrien Grand
Hi, Since Lucene 4.0 which introduced codecs, it is not possible anymore to know based on filename extensions whether files have been created by Lucene or not: every codec is free to use any file extension. On Wed, Sep 18, 2013 at 1:03 PM, Yonghui Zhao zhaoyong...@gmail.com wrote: In lucene

Re: How to modify the Lucene 4 index?

2013-09-18 Thread Adrien Grand
Hi, Are you talking about updating the content of the index or customizing the file formats of the index? On Tue, Sep 17, 2013 at 11:31 PM, Ralf Bierig ralf.bie...@gmail.com wrote: Hi all, is there any good documentation of how to change and modify the index of Lucene version 4 other than

Re: IndexFileNameFilter

2013-09-18 Thread Yonghui Zhao
Got it. Currently I don't use any custom codecs. 2013/9/18 Adrien Grand jpou...@gmail.com Hi, Since Lucene 4.0 which introduced codecs, it is not possible anymore to know based on filename extensions whether files have been created by Lucene or not: every codec is free to use any file

Re: IndexFileNameFilter

2013-09-18 Thread Adrien Grand
Hi, On Wed, Sep 18, 2013 at 1:39 PM, Yonghui Zhao zhaoyong...@gmail.com wrote: Got it. Currently I don't use any custom codecs. Part of the problem is that even the current codec keeps evolving, and file extensions that exist today might not be used anymore in 6 months and vice-versa. I would

Document not searchable after IndexWrite.updateDocument

2013-09-18 Thread Sanket Paranjape
Hi, I wrote a simple code to update a lucene document with new values. Code Snippet: Term term = new Term(PRODUCT_CODE, productCode); TermQuery query = new TermQuery(term); TopDocs productDoc = this.searcher.search(query, 1); int docNum = scoreDoc.doc; Document doc =

RE: Document not searchable after IndexWrite.updateDocument

2013-09-18 Thread Uwe Schindler
Hi, the problem is that a document retrieved by IndexReader.document() only contains stored fields and no indexed fields (they rae no longer accessible from the index). Also, the field types only contain stored as attribute, so when reindexing with IndexWriter you just create a document with

Question about the CompoundWordTokenFilterBase

2013-09-18 Thread Alex Parvulescu
Hi, While trying to play with the CompoundWordTokenFilterBase I noticed that the behavior is to include the original token together with the new sub-tokens. I assume this is expected (haven't found any relevant docs on this), but I was wondering if it's a hard requirement or can I propose a

TotalHitCountCollector performance

2013-09-18 Thread Nicola Buso
Hello, I was going to use the TotalHitCountCollector in cases where I'm interested just in the number of results. Obviously I was hoping to gain in performances compared to a scored query. From my tests it seam it's not so performant compare to the scored search. At this point I'm wondering if

RE: TotalHitCountCollector performance

2013-09-18 Thread Uwe Schindler
Hi, The ConstantScoreQuery part is just overhead. If scores are not requested, they should not be calculated - but CSQ cannot prevent this from happening at all. It just prevent's the collector from seeing the scores. As the counting collector does not request any scores, you just add a

Re: Question about the CompoundWordTokenFilterBase

2013-09-18 Thread Jack Krupansky
Out of curiosity, what is your use case? I mean, the normal use of this filter is to permit a shorthand reference to a long term, but why would you necessarily want to preclude direct reference to the full term? -- Jack Krupansky -Original Message- From: Alex Parvulescu Sent:

Query performance in Lucene 4.x

2013-09-18 Thread Desidero
Hello, Over the last few weeks I've been working on upgrading an application from Lucene 3.x to Lucene 4.x in hopes of improving performance. Unfortunately, after going through the full migration process and playing with all sorts of tweaks I found online and in the documentation, Lucene 4 is

RE: Can you escape characters you don't want the analyzer to modify

2013-09-18 Thread Scott Smith
That's the conclusion I was coming to. Thanks -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Tuesday, September 17, 2013 9:20 PM To: java-user@lucene.apache.org Subject: Re: Can you escape characters you don't want the analyzer to modify It sounds like

Re: Regarding Compression Tool

2013-09-18 Thread Jebarlin Robertson
Hi, Thanks Mark Miller for your advise. I had missed some of the part, thats why I could not get the proper value. I should get the binaryvalue instead of get() for compressed content. I tested all the scnarious and I have some doubts, 1. I observed that while searching with highlighter tool, it

Re: Document not searchable after IndexWrite.updateDocument

2013-09-18 Thread Sanket Paranjape
Hi Uwe, Thanks for explaining. Earlier our system was using 2.4 version and in that this was possible. Anyways, I will implement it correctly as you suggested. On 18-09-2013 07:41 PM, Uwe Schindler wrote: Hi, the problem is that a document retrieved by IndexReader.document() only contains