Re: What's 'java -server' option ?

2009-11-16 Thread Wenbo Zhao
Sorry guys, AGAIN I used wrong search word, please ignore this thread I found a doc here http://java.sun.com/j2se/1.3/docs/guide/performance/hotspot.html I used 'java -server', google gives no useful info Just now I use 'java server vm', then I found the doc It's really an ironic to who is studyin

Re: What's 'java -server' option ?

2009-11-16 Thread Max Lynch
http://stackoverflow.com/questions/198577/real-differences-between-java-server-and-java-client On Mon, Nov 16, 2009 at 7:54 PM, Wenbo Zhao wrote: > Hi, all > I found a suggestion in 'Lucene in Action' : use 'java -server' to run > faster. > As I tested, it's 2 times faster than normal 'java' whi

What's 'java -server' option ?

2009-11-16 Thread Wenbo Zhao
Hi, all I found a suggestion in 'Lucene in Action' : use 'java -server' to run faster. As I tested, it's 2 times faster than normal 'java' which uses '-client' as default. But I can't find any doc about this -server. Does anybody know that ? thanks. -- Best Regards, ZHAO, Wenbo

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
>Can you remap your external data to be per segment? That would provide the tightest integration but would require a major redesign. Currently, the external data is in a single file created by reading a stored field after the Lucene index has been committed. Creating this file is very fast with 2.

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Michael McCandless
Can you remap your external data to be per segment? Presumably hat would make reopens faster for your app. For your custom sort comparator, are you using FieldComparator? If so, Lucene calls setNextReader to tell you the reader & docBase. Failing these, Lucene currently visits the readers in in

Re: Open source search social evening in London - 18th Nov

2009-11-16 Thread Richard Marr
Hi all, Just a reminder that we're meeting up this Wednesday at the location below. It just occurred to me that we haven't put anything about ourselves in the invite... René is a consultant specialising in Lucene, Solr, and linguistics (René please correct me if I'm misrepresenting you horribly)

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Erick Erickson
>From your original e-mail "if the metadata contains the primary key defined, we have to do the search/update for every row based on the primary key". Jake and I are both assuming that you're using primary key in the database sense. That is, there is exactly one document in the index with that pri

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Jake Mannix
You will want to have one Lucene field which contains this composite key - they could be the un-tokenized concatenation of all of the subkeys, for example, and then one Term would have the full composite key, and the updateDocument technique would work fine. -jake On Mon, Nov 16, 2009 at 11:09

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
The same thing is occurring in my custom sort comparator. The ScoreDocs passed to the 'compare' method have docIds that seem to be relative to the segment. Is there any way to translate these into index-wide docIds? Peter On Mon, Nov 16, 2009 at 2:06 PM, Peter Keegan wrote: > I forgot to mention

RE: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread java8964 java8964
But can IndexWriter.updateDocument(Term, Document) handle the composite key case? If my primary key contains field1 and field2, can I use one Term to include both field1 and field2? Thanks > Date: Mon, 16 Nov 2009 09:44:35 -0800 > Subject: Re: What is the best way to handle the primary key ca

RE: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread java8964 java8964
What I mean is that for one index, client can defined multi field in the index as the primary key (composite key). > Date: Mon, 16 Nov 2009 12:45:40 -0500 > Subject: Re: What is the best way to handle the primary key case during > luceneindexing > From: erickerick...@gmail.com > To: java

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I forgot to mention that this is with V2.9.1 On Mon, Nov 16, 2009 at 1:39 PM, Peter Keegan wrote: > I have a custom query object whose scorer uses the 'AllTermDocs' to get all > non-deleted documents. AllTermDocs returns the docId relative to the > segment, but I need the absolute (index-wide) do

Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I have a custom query object whose scorer uses the 'AllTermDocs' to get all non-deleted documents. AllTermDocs returns the docId relative to the segment, but I need the absolute (index-wide) docId to access external data. What's the best way to get the unique, non-deleted docId? Thanks, Peter

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Erick Erickson
Sorry, forgot to add "then re-add the documents in question". On Mon, Nov 16, 2009 at 12:45 PM, Erick Erickson wrote: > What is the form of the unique key? I'm a bit confused here by your > comment: > "which can contain one or multi fields". > > But it seems like IndexWriter.deleteDocuments shoul

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Erick Erickson
What is the form of the unique key? I'm a bit confused here by your comment: "which can contain one or multi fields". But it seems like IndexWriter.deleteDocuments should work here. It's easy if your PKs are single terms, there's even a deleteDocuments(Term[]) form. But this really *requires* that

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Jake Mannix
The usual way to do this is to use: IndexWriter.updateDocument(Term, Document) This method deletes all documents with the given Term in it (this would be your primary key), and then adds the Document you want to add. This is the traditional way to do updates, and it is fast. -jake On Mo

Re: Sort fields shouldn't be tokenized

2009-11-16 Thread Yonik Seeley
On Mon, Nov 16, 2009 at 11:38 AM, Jeff Plater wrote: > Thanks - so if my sort field is a single term then I should be ok with > using an analyzer (to lowercase it for example). Correct - the key is that there is not more than one token per document for the field being sorted on. -Yonik http://ww

What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread java8964 java8964
Hi, In our application, we will allow the user to create a primary key defined in the document. We are using lucene 2.9. In this case, when we index the data coming from the client, if the metadata contains the primary key defined, we have to do the search/update for every row based on the pr

Re: Sort fields shouldn't be tokenized

2009-11-16 Thread J.J. Larrea
You can certainly use an analyzer chain to process the incoming text for a sort field, as long as a single Term emerges or as long as only the first Term is significant for sorting. I don't believe that the fact the field would have the tokenized flag set, makes any difference to the sort l

RE: Sort fields shouldn't be tokenized

2009-11-16 Thread Jeff Plater
Thanks - so if my sort field is a single term then I should be ok with using an analyzer (to lowercase it for example). -Jeff -Original Message- From: J.J. Larrea [mailto:j...@panix.com] Sent: Monday, November 16, 2009 11:19 AM To: java-user@lucene.apache.org Subject: Re: Sort fields sho

Re: Sort fields shouldn't be tokenized

2009-11-16 Thread J.J. Larrea
It's not universally true that a tokenized field cannot be used as a sort field, but it is true that you will not get the desired sort order except in special cases: Lucene's indexes of course contain inverted tables which map Term -> DocumentID, DocumentID, ... But for sorting, once a set

Sort fields shouldn't be tokenized

2009-11-16 Thread Jeff Plater
I am looking at adding some sorting functionality to my application and read that Sort fields should not be tokenized - can anyone explain why? I have code that is tokenizing the sort fields and it seems to be working. Is it just because some tokenizing can change the value (like remove stop word

Re: Can Lucene unite multiple instances run as one ?

2009-11-16 Thread Wenbo Zhao
I just checked 2.9.1 doc from http://lucene.apache.org/java/2_9_1/api/core/index.html I can't find the RemoteSearchable you mentioned. I don't know SOLR yet, look at it tomorrow. Thanks 2009/11/16 Erick Erickson : > I should have read more carefully. > > Look at the Searchable definition. One of t

Re: Can Lucene unite multiple instances run as one ?

2009-11-16 Thread Erick Erickson
I should have read more carefully. Look at the Searchable definition. One of the concrete realizations of that interface is a RemoteSearchable, which is what you're asking for I think. Have you thought about SOLR? It's built on top of Lucene and has lots of stuff built in for handling distributed

Re: Can Lucene unite multiple instances run as one ?

2009-11-16 Thread Wenbo Zhao
About the ParallelMultiSearcher, I don't really know that yet, just a quick look at jdoc. It seems to be a searcher searches other searchables. If all searchables are in same jvm, it won't help. If there is some searchable implementation can work as proxy for a 'remote' lucene instance, then i

Re: Can Lucene unite multiple instances run as one ?

2009-11-16 Thread Wenbo Zhao
1. No, I'm not using sort. Actually I'm just going to start read that section. 2. No, I did only one search '1234567' to 'warmup' the searcher, then OOM 3. After IndexReader/searcher is created, I do a finalize and print total mem used, then use '1234567' to do a search for warmup, and another fin

Re: IndexingChain and TermHash

2009-11-16 Thread Renaud Delbru
Hi, On 16/11/09 13:01, Michael McCandless wrote: Yes, the branch is here: https://svn.apache.org/repos/asf/lucene/java/branches/flex_1458 Mark (Miller) periodically re-sync's it to trunk. Good, thanks ! All tests should pass, and if you create a new Codec, please share the experienc

Re: Can Lucene unite multiple instances run as one ?

2009-11-16 Thread Erick Erickson
I confess that I've just skimmed your e-mail, but there's absolutely no requirement that the entire index fit in RAM. The fact that your index is larger than available RAM isn't the reason you're hitting OOM. Typical reasons for this are: 1> you're sorting on a field with many, many, many unique v

Re: Max number of open IndexWriters

2009-11-16 Thread Erick Erickson
Ah, I understand now. No, Lucene imposes no limits that I know of. But see Ganesh's comments. How many you want to keep open is going to depend upon your traffic. It's the classic tradeoff between efficiency and simple code. Best Erick On Sun, Nov 15, 2009 at 11:14 PM, Hrishikesh Agashe < hrishik

Re: IndexingChain and TermHash

2009-11-16 Thread Michael McCandless
Yes, the branch is here: https://svn.apache.org/repos/asf/lucene/java/branches/flex_1458 Mark (Miller) periodically re-sync's it to trunk. All tests should pass, and if you create a new Codec, please share the experience! There are not yet many Codecs in existence... the branch has the "sta

Re: IndexingChain and TermHash

2009-11-16 Thread Renaud Delbru
Hi Michael, I see there is already a huge amount of work already done in LUCENE-1458. Is there a way to checkout the corresponding branch, and start to use it ? At least, to see if I can extend it and create my own Codec. I have started on my side to abstract the indexing chain of Lucene 2.9,

Re: How to limit fields being loaded into the FieldCache ?

2009-11-16 Thread Michael McCandless
Sounds like you need a better search engine ;) Mike On Sun, Nov 15, 2009 at 10:21 PM, Wenbo Zhao wrote: > Sorry, all folks, please ignore this thread. > I found the section in doc, just start to read that. > I just used wrong term to search before :-) > I searched for 'FieldCache' but in the boo

RE: share some numbers for range queries

2009-11-16 Thread Uwe Schindler
From: Jake Mannix [mailto:jake.man...@gmail.com] > On Sun, Nov 15, 2009 at 11:02 PM, Uwe Schindler wrote: > > > > the second approach is slower, when deleted docs > > are involved and 0 is inside the range (need to consult TermDocs). > > > > This is a good point (and should be mentioned in your