Index writing performance of 3.5

2012-02-08 Thread Vitaly Funstein
Hello, I am currently evaluating Lucene 3.5.0 for upgrading from 3.0.3, and in the context of my usage, the most important parameter is index writing throughput. To that end, I have been running various tests, but seeing some contradictory results from different setups, which hopefully someone wit

RE: Please explain DisjunctionMaxQuery JavaDoc.

2012-02-08 Thread Paul Allan Hill
> -Original Message- > From: Paul Allan Hill [mailto:p...@metajure.com] > Sent: Wednesday, February 08, 2012 2:42 PM > To: java-user@lucene.apache.org > Subject: Please explain DisjunctionMaxQuery JavaDoc. > > What the heck does is the JavaDoc for DisjunctionMaxQuery saying: > >[...] pl

Please explain DisjunctionMaxQuery JavaDoc.

2012-02-08 Thread Paul Allan Hill
What the heck does is the JavaDoc for DisjunctionMaxQuery saying: "A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matchi

Working with MemoryIndex results

2012-02-08 Thread Dave Seltzer
Hello, I'm using a MemoryIndex in order to search a block of in-memory text using a lucene query. I'm able to search the text, produce a result, and excerpt a highlight using the highlighter. Right now I'm doing this: MemoryIndex index = new MemoryIndex(); index.addField("content", fullText, Luc

Re: slow speed of searching

2012-02-08 Thread Cheng
thanks a lot On Wed, Feb 8, 2012 at 9:48 PM, Ian Lea wrote: > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > > (the 3rd item is Use a local filesystem!) > > -- > Ian. > > > On Wed, Feb 8, 2012 at 12:44 PM, Cheng wrote: > > Hi, > > > > I have about 6.5 million documents which lead to

Re: slow speed of searching

2012-02-08 Thread Ian Lea
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed (the 3rd item is Use a local filesystem!) -- Ian. On Wed, Feb 8, 2012 at 12:44 PM, Cheng wrote: > Hi, > > I have about 6.5 million documents which lead to 1.5G index. The speed of > search a couple terms, like "dvd" and "price", causes a

Re: how to create directory on a remote server protected by password

2012-02-08 Thread Ian Lea
Don't. Likely to cause more problems than it's worth. See recent thread on "Why read past EOF". But if you really feel you must, either write your own implementation of FSDirectory or mount the remote folder locally at the OS level using SMB or NFS or whatever. I know which one I'd go for, exce

Re: NRTManager and AlreadyClosedException

2012-02-08 Thread Simon Willnauer
are you closing the NRTManager while other threads still accessing the SearcherManager? simon On Wed, Feb 8, 2012 at 1:48 PM, Cheng wrote: > I use it exactly the same way. So there must be other reason causing the > problem. > > On Wed, Feb 8, 2012 at 8:21 PM, Ian Lea wrote: > >> Releasing a se

Re: NRTManager and AlreadyClosedException

2012-02-08 Thread Cheng
I use it exactly the same way. So there must be other reason causing the problem. On Wed, Feb 8, 2012 at 8:21 PM, Ian Lea wrote: > Releasing a searcher is not the same as closing the searcher manager, > if that is what you mean. > > The searcher should indeed be released, but once only for each

Re: NRTManager and AlreadyClosedException

2012-02-08 Thread Ian Lea
Releasing a searcher is not the same as closing the searcher manager, if that is what you mean. The searcher should indeed be released, but once only for each acquire(). Your searching threads should have code like that shown in the SearcherManager javadocs. IndexSearcher s = manager.acquire();

Re: NRTManager and AlreadyClosedException

2012-02-08 Thread Cheng
You are right. There is a method by which I do searching. At the end of the method, I release the index searcher (not the searchermanager). Since this method is called by multiple threads. So I think the index searcher will be released multiple times. First, I wonder if releasing searcher is same

Re: NRTManager and AlreadyClosedException

2012-02-08 Thread Ian Lea
Are you closing the SearcherManager? Calling release() multiple times? >From the exception message the first sounds most likely. -- Ian. On Wed, Feb 8, 2012 at 5:20 AM, Cheng wrote: > Hi, > > I am using NRTManager and NRTManagerReopenThread. Though I don't close > either writer or the reopen

Re: Why read past EOF

2012-02-08 Thread Michael McCandless
Hmm, there's a problem with the logic here (sorry: this is my fault -- my prior suggestion is flat out wrong!). The problem is... say you commit once, creating commit point 1. Two hours later, you commit again, creating commit point 2. The bug is, at this point, immediately on committing commit

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Petite Abeille
On Feb 8, 2012, at 10:14 AM, Danil ŢORIN wrote: > For example if you only query data for 1 month intervals, and you > partition by date, you can calculate in which shard your data can be > found, and query just that shard. This is what one calls "partition pruning" in database terms. http://en.

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Petite Abeille
On Feb 8, 2012, at 10:14 AM, Danil ŢORIN wrote: > For example if you only query data for 1 month intervals, and you > partition by date, you can calculate in which shard your data can be > found, and query just that shard. This is what one calls "partition pruning" in database terms. http://en.

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Danil ŢORIN
It also depends on your queries. For example if you only query data for 1 month intervals, and you partition by date, you can calculate in which shard your data can be found, and query just that shard. If you can find a partition key that is always present in the query, you can create a gazillion