Glen, thank you for this very thorough and informative post.
Lance Norskog
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
There are a number of strategies, on the Java or OS side of things:
- Use huge pages[1]. Esp on 64 bit and lots of ram. For long running,
large memory (and GC busy) applications, this has achieved significant
improvements. Like 300% on EJBs. See [2],[3],[4]. For a great article
introducing and benc
H, if you somehow know the last date you processed, why wouldn't using a
range query work for you? I.e.
date:[ TO ]?
Best
Erick
On Wed, Jul 14, 2010 at 10:37 AM, Max Lynch wrote:
> You could have a field within each doc say "Processed" and store a
>
> > value Yes/No, next run a searcher que
Kiran:
Please start a new thread when asking a new question. From Hossman's apache
page:
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track whi
This doesn't make sense to me. Are you saying that you only have 200,000
documents in your index? Because keeping a score for 200K documents should
consume a relatively trivial amount of memory. The fact that you're sorting
by time is a red flag, but it's only a long, so 200K documents shouldn't
st
Hi Toke-
> > * 20 million documents [...]
> > * 140GB total index size
> > * Optimized into a single segment
>
> I take it that you do not have frequent updates? Have you tried to see if you
> can get by with more segments without significant slowdown?
Correct - in fact there are no updates and n
All,
Issue: Unable to get the proper results after searching. I added sample code
which I used in the application.
If I used *numHitPerPage* value as 1000 its giving expected results.
ex: The expected results is 32 docs but showing 32 docs
Instead If I use *numHitPerPage* as 2^32-1 its not giving
You could have a field within each doc say "Processed" and store a
> value Yes/No, next run a searcher query which should give you the
> collection of unprocessed ones.
>
That sounds like a reasonable idea, and I just realized that I could have
done that in a way specific to my application. Howe
I have also confused about the memory management of lucene.
Where is this out of memory problem is mainly arised from Reason-1 or Reason-2
reason?
Reason-1 : Problem is sourced from searching is done in big indexed file
(nearly 40 GB) If there is 100(small number of records) records returned
Hi ,
I have 4 query search fields.
case 1 : if i use one search
field to make a query filter and then use the query filter to search on
other 3 fields so as to reduce the searching docs subset.
case 2: i use
all query parameters using boolean query , whole of index will be searched.
Which
Hi,
We have hardware restrictions(Max RAM can be 8GB). So, unfortunately,
increasing memory can not be option for us for today's situation.
Yes, as you said that problem is faced when goes to last pages of search screen
because of using search method which is find top n records. In other way,
Indeed, this is good solution to that kind of problems. But same problem can
be occured in future when logs are added to index file.
For example, here 200,000 records have problem(These logs are collected in 13
days).
With that reverse way, there will be maximum search range is 100,000.
But
Reverse the query sorting to display the last page.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: ilkay polat [mailto:ilkay_po...@yahoo.com]
> Sent: Wednesday, July 14, 2010 12:44 PM
> To: java-user@lu
Certainly it will. Either you need to increase your memory OR refine your
query. Eventhough you display paginated result. The first couple of pages
will display fine and going towards last may face problem. This is because,
200,000 objects is created and iterated, 190,900 objects are skipped and
la
Hello Friends;
Recently, I have problem with lucene search - memory problem on the basis that
indexed file is so big. (I have indexed some kinds of information and this
indexed file's size is nearly more than 40 gigabyte. )
I search the lucene indexed file with
org.apache.lucene.search.Searc
Hello all,
We have launched a new site, which provides the best open source products
and libraries across all categories. This site is powered by Solr search.
There are many open source products available in all categories and it is
sometimes difficult to identify which is the best. The main probl
Hi,
i had a similar need to create somethink that acts not like a "filter"
or "tokenizer" but only inserts self-generated tokens into the token-stream.
(my purpose was to generate all kinds of word-forms for german umlauts...)
the following code-base helped me a lot to create it:
http://207.44
> Trying to analyze PositionFilter: didn't understand why earlier the
> search of 'Nina Simone I Put' failed since atleast the phrase 'Nina
> Simone' should have matched against title_0 field. Any clue?
Please note that I have configure the ShingleFilter as bigrams without unigrams.
[Honestly, I
You can also set the termsIndexDivisor when opening the IndexReader.
The terms index is an in-memory data structure and it an consume ALOT
of RAM when your index has many unique terms.
Flex (only on Lucene's trunk / next major release (4.0)) has reduced
this RAM usage (as well as the RAM required
Hi Steve,
Thanks, wrapping with PositionFilter actually worked the search and
score -- I made a mistake while re-indexing last time.
Trying to analyze PositionFilter: didn't understand why earlier the
search of 'Nina Simone I Put' failed since atleast the phrase 'Nina
Simone' should have matched
On Tue, 2010-07-13 at 23:49 +0200, Christopher Condit wrote:
> * 20 million documents [...]
> * 140GB total index size
> * Optimized into a single segment
I take it that you do not have frequent updates? Have you tried to see
if you can get by with more segments without significant slowdown?
> Th
Hi Steve,
Thanks for your kind response. I checked PositionfilterFactory
(re-index as well) but that also didn't solve the problem. Interesting
the problem is not reproduceable from Solr's Field Analysis page, it
manifests only when it's in a query.
I guess the subject for this post is not very c
You have two options
1. Store the compressed text as part of stored field in Solr.
2. Using external caching.
http://www.findbestopensource.com/tagged/distributed-caching
You could use ehcache / Memcache / Membase.
The problem with external caching is you need to synchronize the deletions
and
23 matches
Mail list logo