Dynamic Indexing?

2009-03-12 Thread Thomas J. Buhr
Lucene, From what I have read on your website indexing does seem like a useful thing. I'm considering the possible use of Lucene in a company project and have a few research questions. What I'm considering is using Lucene as a backend data store for a graphic editor. The typical usage

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi Please find attadched a test case plus a document. Just to mention this occurs sometimes for other files. Cheers Amin On Wed, Mar 11, 2009 at 6:11 PM, markharw00d markharw...@yahoo.co.ukwrote: If you can supply a Junit test that recreates the problem I think we can start to make progress

Re: Memory during Indexing

2009-03-12 Thread Michael McCandless
Niels Ott wrote: Hi Mark, markharw00d schrieb: Hi Niels, See the javadocs for IndexWriter.setRAMBufferSizeMB() I tried different settings. Apart from the fact that my memory issue seems to by my own fault, I'm wondering what Lucene does in the background. Apparently it does flush(),

StandardTokenizer issue ?

2009-03-12 Thread iMe
I spotted an unexepcted behavior when using the StandardAnalyzer. This analyzer uses the StandardTokenizer which javadoc states: Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split. But looking

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread mark harwood
The attachment didn't make it through here. Can you add it as an attachment to a new JIRA issue? Thanks, Mark From: Amin Mohammed-Coleman ami...@gmail.com To: java-user@lucene.apache.org Sent: Thursday, 12 March, 2009 7:47:20 Subject: Re: Lucene Highlighting

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi Did both attachments not come through? Cheers Amin On Thu, Mar 12, 2009 at 9:52 AM, mark harwood markharw...@yahoo.co.ukwrote: The attachment didn't make it through here. Can you add it as an attachment to a new JIRA issue? Thanks, Mark From:

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
JIRA raised: https://issues.apache.org/jira/browse/LUCENE-1559 Thanks On Thu, Mar 12, 2009 at 11:29 AM, Amin Mohammed-Coleman ami...@gmail.comwrote: Hi Did both attachments not come through? Cheers Amin On Thu, Mar 12, 2009 at 9:52 AM, mark harwood markharw...@yahoo.co.ukwrote: The

Re: Memory during Indexing

2009-03-12 Thread Niels Ott
Michael McCandless schrieb: When RAM is full, IW flushes the pending changes to disk, but does not commit them, meaning external (newly opened or reopened) readers will not see the changes. Is there a built-in mechanism in the IndexReader to reload the index every now and then, after having

What kind of performance to expect from a MultiTermQuery being used in BooleanQuery?

2009-03-12 Thread ArtemGr
Hi! I have this NotEmptyQuery class (http://gist.github.com/78115) which extends the MultiTermQuery. The class is added into a BooleanQuery, after some other queries (e.g. after TermQuery and LongTrieRangeFilter queries). I wonder: does Lucene need to scan all the terms in the inverted index and

Getting Field details on a hit

2009-03-12 Thread NickHirst
Hello Experts, I am using a MultiFieldQueryParser to search my index. The index has been set up with the following structure: design: [designcode] att1: [att1Value] att2: [att2Value] ... attn: [attnValue] Where the attvalues all correspond to the designcode. The search works well, and it

Re: Memory during Indexing

2009-03-12 Thread Grant Ingersoll
On Mar 12, 2009, at 10:47 AM, Niels Ott wrote: Michael McCandless schrieb: When RAM is full, IW flushes the pending changes to disk, but does not commit them, meaning external (newly opened or reopened) readers will not see the changes. Is there a built-in mechanism in the IndexReader to

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi I have found that it is not issue with POI. I extracted text using PoI but differenlty and the term is extracted properly. When I store the text and retrieve it the term exists. However running the text through highlighter doesn't work I will post test case with plain text file on

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
JIRA updated. Includes new testcase which shows highlighter not working as expected. On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman ami...@gmail.comwrote: Hi I have found that it is not issue with POI. I extracted text using PoI but differenlty and the term is extracted properly.

Re: search problem when indexed using Field.setOmitTf()

2009-03-12 Thread Otis Gospodnetic
I bet omitTf will be confusing to people. When I see omitTf I read that as aha, don't store term frequency. I don't read that as don't store term frequency and don't store positional information. We'll have to document this well or maybe even consider renaming this so it's more

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
I did the following: highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE); which works. On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman ami...@gmail.comwrote: JIRA updated. Includes new testcase which shows highlighter not working as expected. On Thu, Mar 12, 2009 at 5:56 PM,

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Michael McCandless
IndexWriter has such behavior too, and because it was such a common trap (developers could not understand why their content was being truncated), we made that setting explicit, up front so you were aware of it. I think this in general is a reasonable approach for settings that lose stuff

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi I think that would be good. Probably a silly thing to ask but I guess there is a performance implication by setting it to max value. Is there a general setting that other developers use? Cheers Amin On 12 Mar 2009, at 22:03, Michael McCandless luc...@mikemccandless.com wrote: