Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood
But... how come setting IW's RAM buffer doesn't prevent the OOMs? I've been setting the IndexWriter RAM buffer to 300 meg and giving the JVM 1gig. Last run I gave the JVM 3 gig, with writer settings of RAM buffer=300 meg, merge factor=20, term interval=8192, usecompound=false. All fields

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Ian Lea
That's not the usual OOM message is it? java.lang.OutOfMemoryError: GC overhead limit exceeded. Looks like you might be able to work round it with -XX:-UseGCOverheadLimit http://java-monitor.com/forum/archive/index.php/t-54.html

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood
Thanks, Ian. I forgot to mention I tried that setting and it then seemed to hang indefinitely. I then switched back to a strategy of trying to minimise memory usage or at least gain an understanding of how much memory would be required by my application. Cheers Mark - Original Message

RE: A model for predicting indexing memory costs?

2009-03-10 Thread Uwe Schindler
It does not indefinitely hang, I think the problem is, that the GC takes up all processor resources and nothing else runs any more. You should also enable the parallel GC. We had similar problems on the searching side, when the webserver suddenly stopped for about 20 minutes (!) and doing nothing

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood
It does not indefinitely hang, I guess I just need to be more patient. Thanks for the GC settings. I don't currently have the luxury of 15 other processors but this will definitely be of use in other environments. How works TrieRange for you? I used it back when it was tucked away in PanFMP

Re: Questions about analyzer

2009-03-10 Thread Erick Erickson
Yes, I replied 4 days ago, is your SPAM filter interfering? On Tue, Mar 10, 2009 at 8:35 AM, Ganesh emailg...@yahoo.co.in wrote: Any reply on this? - Original Message - From: Ganesh emailg...@yahoo.co.in To: java-user@lucene.apache.org Sent: Monday, March 09, 2009 11:28 AM Subject:

Re: Questions about analyzer

2009-03-10 Thread Ganesh
Erick, I got your reply, but i asked more more query. Mike in of his replies to the thread Faceted search using Lucene, gave the following code review comment * You are creating a new Analyzer QueryParser every time, also creating unnecessary garbage; instead, they should be created once

RE: A model for predicting indexing memory costs?

2009-03-10 Thread Uwe Schindler
It does not indefinitely hang, I guess I just need to be more patient. Thanks for the GC settings. I don't currently have the luxury of 15 other processors but this will definitely be of use in other environments. Even with one processor, a parallel GC is sometimes better. The traditional

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood
OK. What do you think about LUCENE-1541, does the more complicated APIrectify the space improvement and reduced term number? I don't see the Trie terms being the main contributor to the term pool. Using the Luke vocabulary-growth plugin I can see the number of unique terms tailing off fairly

Re: index large size file

2009-03-10 Thread Mark Miller
Amy Zhou wrote: Hi, I'm having a couple of questions about indexing large size file. As my understanding, the default MaxFieldLength 100,000. In Lucene 2.4, we can set the MaxFieldLength during constructor. My questions are: The default is 10,000. 1) How's the performance if

RE: index large size file

2009-03-10 Thread Amy Zhou
My issue here is that large file is truncated with default MaxFieldLength 10,000 during indexing. The file size I index could be 10mb or larger. My questions are: 1) If I chose MaxFieldLength as UNLIMITED instead of 100,000, what the performance could be? 2) Any other options? -Original

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood
Could you get a heap dump (eg with YourKit) of what's using up all the memory when you hit OOM? On this particular machine I have a JRE, no admin rights and therefore limited profiling capability :( That's why I was trying to come up with some formula for estimating memory usage. When you

Re: index large size file

2009-03-10 Thread Erick Erickson
Sure there are other options. You could decide to index in chunks rather then entire documents. You could decide many things. None of which we can recommend unless we have a clue what you're really trying to accomplish or whether you're encountering a specific problem. I can say that we've

RE: index large size file

2009-03-10 Thread Amy Zhou
Thanks Eric for your quick response and useful information. I'll give a try to bump up the MaxFieldLength and check the performance. It seems the quickest way to handle the issue. Amy -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, March 10,

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Erick Erickson
You have my sympathy. Let's see, you're being told we can't give you the tools you need to diagnose/fix the problem, but fix it anyway. Probably with the addendum And fix it by Friday. You might want to consider staging a mutiny until the powers that be can give you a solution. Perhaps working

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood
I get really belligerent when being told to solve problems while wearing a ball-and-chain. I seem to have touched quite a nerve there then, Erick ;) I appreciate your sympathy. To be fair I haven't exhausted all possible avenues in changing the environment but I do remain interested in

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Michael McCandless
mark harwood wrote: Could you get a heap dump (eg with YourKit) of what's using up all the memory when you hit OOM? On this particular machine I have a JRE, no admin rights and therefore limited profiling capability :( That's why I was trying to come up with some formula for estimating

RE: A model for predicting indexing memory costs?

2009-03-10 Thread Jon Loken
Hi, I haven't followed the whole thread, so pardon me if I am off topic. In terms of OutOfMemoryExceptions, why not attempt to alleviate this in your code, rather than overly relying on garbage collection. On other words: set big objects to null when you are finished with them, in particular

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Grant Ingersoll
On Mar 10, 2009, at 7:55 AM, mark harwood wrote: It does not indefinitely hang, I guess I just need to be more patient. Thanks for the GC settings. I don't currently have the luxury of 15 other processors but this will definitely be of use in other environments. It is also, usually

Re: How to search both Tokenized and Untokenized fields

2009-03-10 Thread rokham
Thanks a bunch for you very prompt reply. I looked into the PerFieldAnalyzerWrapper class and I understand how you can add a specific analyzer for each field. My question is how does this link to the query that's sent to me. If I'm given a query as follows: (+tokenized:value1 +tokenized:vaue2)