Re: Best practices for searcher memory usage?

2010-07-14 Thread Lance Norskog
Glen, thank you for this very thorough and informative post. Lance Norskog - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Best practices for searcher memory usage?

2010-07-14 Thread Glen Newton
There are a number of strategies, on the Java or OS side of things: - Use huge pages[1]. Esp on 64 bit and lots of ram. For long running, large memory (and GC busy) applications, this has achieved significant improvements. Like 300% on EJBs. See [2],[3],[4]. For a great article introducing and benc

Re: Continuously iterate over documents in index

2010-07-14 Thread Erick Erickson
H, if you somehow know the last date you processed, why wouldn't using a range query work for you? I.e. date:[ TO ]? Best Erick On Wed, Jul 14, 2010 at 10:37 AM, Max Lynch wrote: > You could have a field within each doc say "Processed" and store a > > > value Yes/No, next run a searcher que

Re: Continuously iterate over documents in index

2010-07-14 Thread Erick Erickson
Kiran: Please start a new thread when asking a new question. From Hossman's apache page: When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track whi

Re: Out of memory problem in search

2010-07-14 Thread Erick Erickson
This doesn't make sense to me. Are you saying that you only have 200,000 documents in your index? Because keeping a score for 200K documents should consume a relatively trivial amount of memory. The fact that you're sorting by time is a red flag, but it's only a long, so 200K documents shouldn't st

RE: Best practices for searcher memory usage?

2010-07-14 Thread Christopher Condit
Hi Toke- > > * 20 million documents [...] > > * 140GB total index size > > * Optimized into a single segment > > I take it that you do not have frequent updates? Have you tried to see if you > can get by with more segments without significant slowdown? Correct - in fact there are no updates and n

Re: Continuously iterate over documents in index

2010-07-14 Thread Kiran Kumar
All, Issue: Unable to get the proper results after searching. I added sample code which I used in the application. If I used *numHitPerPage* value as 1000 its giving expected results. ex: The expected results is 32 docs but showing 32 docs Instead If I use *numHitPerPage* as 2^32-1 its not giving

Re: Continuously iterate over documents in index

2010-07-14 Thread Max Lynch
You could have a field within each doc say "Processed" and store a > value Yes/No, next run a searcher query which should give you the > collection of unprocessed ones. > That sounds like a reasonable idea, and I just realized that I could have done that in a way specific to my application. Howe

Re: Out of memory problem in search

2010-07-14 Thread ilkay polat
I have also  confused about the memory management of lucene. Where is this out of memory problem is mainly arised from Reason-1 or Reason-2 reason?   Reason-1 : Problem is sourced from searching is done in big indexed file (nearly 40 GB) If there is 100(small number of records) records returned

subset query :query filter or boolean query

2010-07-14 Thread suman.holani
Hi , I have 4 query search fields. case 1 : if i use one search field to make a query filter and then use the query filter to search on other 3 fields so as to reduce the searching docs subset. case 2: i use all query parameters using boolean query , whole of index will be searched. Which

Re: Out of memory problem in search

2010-07-14 Thread ilkay polat
Hi, We have hardware restrictions(Max RAM can be  8GB). So, unfortunately,  increasing memory can not be option for us for today's situation. Yes, as you said that problem is faced when goes to last pages of search screen because of using search method which is find top n records. In other way,

RE: Out of memory problem in search

2010-07-14 Thread ilkay polat
Indeed, this is  good solution to that kind of problems. But same problem can be  occured in future when logs are added to index file. For example, here 200,000 records have problem(These logs are collected in 13 days). With that reverse way, there will be maximum search range is 100,000. But

RE: Out of memory problem in search

2010-07-14 Thread Uwe Schindler
Reverse the query sorting to display the last page. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ilkay polat [mailto:ilkay_po...@yahoo.com] > Sent: Wednesday, July 14, 2010 12:44 PM > To: java-user@lu

Re: Out of memory problem in search

2010-07-14 Thread findbestopensource
Certainly it will. Either you need to increase your memory OR refine your query. Eventhough you display paginated result. The first couple of pages will display fine and going towards last may face problem. This is because, 200,000 objects is created and iterated, 190,900 objects are skipped and la

Out of memory problem in search

2010-07-14 Thread ilkay polat
Hello Friends; Recently, I have problem with lucene search - memory problem on the basis that indexed file is so big. (I have indexed some kinds of information and this indexed file's size is nearly more than 40 gigabyte. )  I search the lucene indexed file with org.apache.lucene.search.Searc

Best open source

2010-07-14 Thread findbestopensource
Hello all, We have launched a new site, which provides the best open source products and libraries across all categories. This site is powered by Solr search. There are many open source products available in all categories and it is sometimes difficult to identify which is the best. The main probl

Re: How to create a fuzzy suggest

2010-07-14 Thread Alexander Rothenberg
Hi, i had a similar need to create somethink that acts not like a "filter" or "tokenizer" but only inserts self-generated tokens into the token-stream. (my purpose was to generate all kinds of word-forms for german umlauts...) the following code-base helped me a lot to create it: http://207.44

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
> Trying to analyze PositionFilter: didn't understand why earlier the > search of 'Nina Simone I Put' failed since atleast the phrase 'Nina > Simone' should have matched against title_0 field. Any clue? Please note that I have configure the ShingleFilter as bigrams without unigrams. [Honestly, I

Re: Best practices for searcher memory usage?

2010-07-14 Thread Michael McCandless
You can also set the termsIndexDivisor when opening the IndexReader. The terms index is an in-memory data structure and it an consume ALOT of RAM when your index has many unique terms. Flex (only on Lucene's trunk / next major release (4.0)) has reduced this RAM usage (as well as the RAM required

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve, Thanks, wrapping with PositionFilter actually worked the search and score -- I made a mistake while re-indexing last time. Trying to analyze PositionFilter: didn't understand why earlier the search of 'Nina Simone I Put' failed since atleast the phrase 'Nina Simone' should have matched

Re: Best practices for searcher memory usage?

2010-07-14 Thread Toke Eskildsen
On Tue, 2010-07-13 at 23:49 +0200, Christopher Condit wrote: > * 20 million documents [...] > * 140GB total index size > * Optimized into a single segment I take it that you do not have frequent updates? Have you tried to see if you can get by with more segments without significant slowdown? > Th

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve, Thanks for your kind response. I checked PositionfilterFactory (re-index as well) but that also didn't solve the problem. Interesting the problem is not reproduceable from Solr's Field Analysis page, it manifests only when it's in a query. I guess the subject for this post is not very c

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and