Re: knowing which field contributed the search result

2005-02-22 Thread John Wang
Hi David: Can you further explain which calls specically would solve my problem? Thanks -John On Mon, 21 Feb 2005 12:20:15 -0800, David Spencer <[EMAIL PROTECTED]> wrote: > John Wang wrote: > > > Anyone has any thoughts on this? > > Does this help? > > ht

Re: knowing which field contributed the search result

2005-02-21 Thread John Wang
Anyone has any thoughts on this? Thanks -John On Wed, 16 Feb 2005 14:39:52 -0800, John Wang <[EMAIL PROTECTED]> wrote: > Hi: > >Is there way to find out given a hit from a search, find out which > fields contributed to the hit? > > e.g. > > If my search

knowing which field contributed the search result

2005-02-16 Thread John Wang
Hi: Is there way to find out given a hit from a search, find out which fields contributed to the hit? e.g. If my search for: contents1="brown fox" OR contents2="black bear" can the document founded by this query also have information on whether it was found via contents1 or contents2 or bot

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread John Wang
I think Google mini also includes crawling and a server wrapper. So it is not entirely an 1-to-1 comparison. Of couse extending lucene to have those features are not at all difficult anyway. -John On Thu, 27 Jan 2005 16:04:54 -0800 (PST), Xiaohong Yang (Sharon) <[EMAIL PROTECTED]> wrote: > Hi,

lucene2.0 and transaction support

2005-01-20 Thread John Wang
Hi: When is lucene 2.0 scheduled to be released? Is there a javadoc somewhere so we can check out the new APIs? Is there a plan to add transaction support into lucene? This is something we need and if we do implement it ourselves, is it too large of a change for a patch? Thanks -John --

Re: setting Similarity at search time

2005-01-07 Thread John Wang
Hi Chuck: Trying to follow up on this thread. Do you know if this feature will be incorporated in the next Lucene release? How would someone find out which patches will go into the next release? Thanks -John On Mon, 15 Nov 2004 13:05:36 -0800, Chuck Williams <[EMAIL PROTECTED]> wrot

Re: reading fields selectively

2005-01-07 Thread John Wang
Thanks guys for the info! After looking at the patch code I have two problems: 1) The patch implementation doesn't help with performance. It still reads the data for every field in the document. Just not storing all of them. So this implementation helps if there are memory restrictions, but not i

reading fields selectively

2005-01-06 Thread John Wang
Hi: Is there some way to read only 1 field value from an index given a docID? From the current API, in order to get a field from given a docID, I would call: IndexSearcher.document(docID) which in turn reads in all fields from the disk. Here is my problem: After

Re: multi-threaded thru-put in lucene

2005-01-06 Thread John Wang
unaccounted for. Is that due to thread scheduling/context switching? Thanks -John On Thu, 6 Jan 2005 10:36:12 -0800, John Wang <[EMAIL PROTECTED]> wrote: > Is the operation IndexSearcher.search I/O or CPU bound if I am doing > 100's of searches on the same query? > > Thank

Re: multi-threaded thru-put in lucene

2005-01-06 Thread John Wang
Is the operation IndexSearcher.search I/O or CPU bound if I am doing 100's of searches on the same query? Thanks -John On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > > 1 thread: 445 ms. > > 2 threads: 870

Re: multi-threaded thru-put in lucene

2005-01-06 Thread John Wang
I actually ran a few tests. But seeing similar behaviors. After removing all the possible variations, this is what I used: 1 Index, doccount is 15,000. Using FSDirectory, e.g. new IndexSearcher(String path), by default I think it uses FSDirectory. each thread is doing 100 iterations of search, e

multi-threaded thru-put in lucene

2005-01-05 Thread John Wang
Hi folks: We are trying to measure thru-put lucene in a multi-threaded environment. This is what we found: 1 thread, search takes 20 ms. 2 threads, search takes 40 ms. 5 threads, search takes 100 ms. Seems like under a multi-threaded scenario, thru-put isn't go

Re: Remotely Index

2004-12-16 Thread John Wang
one way is to create a reader from a URL to your file: (Assuming the file is hosted somewhere reachable by an URL) Reader r=new InputStreamReader(url.getInputStream()); Document doc=new Document(); doc.addField(Field.Keyword("url",url.toString())); doc.addField(Field.Text("contents",r)); iw.add

File locking using java.nio.channels.FileLock

2004-12-15 Thread John Wang
Hi: When is Lucene planning on moving toward java 1.4+? I see there are some problems caused from the current lock file implementation, e.g. Bug# 32171. The problems would be easily fixed by using the java.nio.channels.FileLock object. Thanks -John

Re: finalize delete without optimize

2004-12-14 Thread John Wang
#x27;ll get access to > 'internal' methods, of course. If you end up creating this, we could > stick it in the Sandbox, where we should really create a new section > for handy command-line tools that manipulate the index. > > Otis > > > > > --- John Wang

Re: Lucene Vs Ixiasoft

2004-12-08 Thread John Wang
I thought Lucene implements the Boolean model. -John On Thu, 9 Dec 2004 00:19:21 +0100, Nicolas Maisonneuve <[EMAIL PROTECTED]> wrote: > hi, > think first of the relevance of the model in this 2 search engine for > XML document retrieval. > > Lucene is classic fulltext search engine using the

Re: finalize delete without optimize

2004-12-08 Thread John Wang
Hi folks: I sent this out a few days ago without a response. Please help. Thanks in advance -John On Mon, 6 Dec 2004 21:15:00 -0800, John Wang <[EMAIL PROTECTED]> wrote: > Hi: > > Is there a way to finalize delete, e.g. actually remove them from > the segments

finalize delete without optimize

2004-12-06 Thread John Wang
Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is t

Re: Recommended values for mergeFactor, minMergeDocs, maxMergeDocs

2004-12-04 Thread John Wang
We've found something interesting about mergeFactors. We are indexing a million documents with a batch of 1000. We first set the mergeFactor to 1000. What we found is at every 10th commit, we see a significant spike in indexing time. The reason is that the indexer is trying to merge the segments

Re: URGENT: Help indexing large document set

2004-11-27 Thread John Wang
rton, Location - San Francisco, CA > >AIM/YIM - sfburtonator, Web - http://peerfear.org/ > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 > > > > > > > > > > -

Re: Too many open files issue

2004-11-24 Thread John Wang
I have also seen this problem. In the Lucene code, I don't see where the reader speicified when creating a field is closed. That holds on to the file. I am looking at DocumentWriter.invertDocument() Thanks -John On Mon, 22 Nov 2004 16:21:35 -0600, Chris Lamprecht <[EMAIL PROTECTED]> wrote: >

Re: URGENT: Help indexing large document set

2004-11-24 Thread John Wang
+0100, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Wednesday 24 November 2004 00:37, John Wang wrote: > > > > Hi: > > > >I am trying to index 1M documents, with batches of 500 documents. > > > >Each document has an unique text key, which is added

Re: URGENT: Help indexing large document set

2004-11-23 Thread John Wang
It looks to me like it scans sequentially > only within a small buffer window (of size > SegmentTermEnum.indexInterval) and that it uses binary search otherwise. > See TermInfosReader.getIndexOffset(Term). > > Chuck > > > > > -Original Message- >

URGENT: Help indexing large document set

2004-11-23 Thread John Wang
Hi: I am trying to index 1M documents, with batches of 500 documents. Each document has an unique text key, which is added as a Field.KeyWord(name,value). For each batch of 500, I need to make sure I am not adding a document with a key that is already in the current index. To do this

Re: Index in RAM - is it realy worthy?

2004-11-22 Thread John Wang
In my test, I have 12900 documents. Each document is small, a few discreet fields (KeyWord type) and 1 Text field containing only 1 sentence. with both mergeFactor and maxMergeDocs being 1000 using RamDirectory, the indexing job took about 9.2 seconds not using RamDirectory, the indexing job too

indexing benchmark

2004-11-22 Thread John Wang
Hi folks: Is there an indexing benchmark somewhere? I see a search benchmark on the lucene home site. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

lucene transaction and roll back implementation

2004-11-17 Thread John Wang
Hi folks: How does lucene implement transaction and roll back. E.g. if the machine crashes (from power outage etc.) in the middle of a write, e.g. indexWriter.close()? From examining the code, seems that there is a possibility such crash can cause a corrupted index. (in segmentInfos, new data

lucene file locking question

2004-11-11 Thread John Wang
Hi folks: My application builds a super-index around the lucene index, e.g. stores some additional information outside of lucene. I am using my own locking outside of the lucene index via FileLock object in the jdk1.4 nio package. My code does the following: FileLock lock=nu

authentication support in lucene

2004-07-22 Thread John Wang
Hi: Maybe this has been asked before. Is there a plan to support ACL check on the documents in lucene? Say I have a customized ACL check module, e.g.: boolean ACLCheck(int docID,String user,String password); And have some sort of framework to plug in something like that.

Re: speeding up lucene search

2004-07-21 Thread John Wang
In general, yes. By splitting up a large index into smaller indicies, you are linearizing the search time. Furthermore, that allows you to make your search distributable. -John On Wed, 21 Jul 2004 13:00:28 +1000, Anson Lau <[EMAIL PROTECTED]> wrote: > Hello guys, > > What are some general techni

Re: lucene cutomized indexing

2004-07-21 Thread John Wang
Hi Eric and Grant: Thanks for the replies and this is certainly encouraging. As suggested, I will post furthere such discussions to the dev list. Thanks -John On Tue, 20 Jul 2004 15:37:35 -0400, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > It seems to me the answer to this is not necessari

Re: lucene cutomized indexing

2004-07-20 Thread John Wang
wrote: > On Tuesday 20 July 2004 18:12, John Wang wrote: > > > They make sure during deployment their "versions" > > gets loaded before the same classes in the lucene .jar. > > I don't see why people cannot just make their own lucene.jar. Just remove > the

Re: lucene cutomized indexing

2004-07-20 Thread John Wang
On Tue, 20 Jul 2004 13:40:28 -0400, Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Jul 20, 2004, at 12:12 PM, John Wang wrote: > > There are few things I want to do to be able to customize lucene: > > > [...] > > > > 3) to be able to customize analyzers to a

Re: lucene cutomized indexing

2004-07-20 Thread John Wang
EMAIL PROTECTED]> wrote: > On Tuesday 20 July 2004 17:28, John Wang wrote: > > >I have asked to make the Lucene API less restrictive many many many > > times but got no replies. > > I suggest you just change it in your source and see if it works. Then you can > s

lucene cutomized indexing

2004-07-20 Thread John Wang
Hi: I am trying to store some Databased like field values into lucene. I have my own way of storing field values in a customized format. I guess my question is wheather we can make the Reader/Writer classes, e.g. FieldReader, FieldWriter, DocumentReader/Writer classes non-final? I have a

lucene philosophy

2004-07-13 Thread John Wang
Hi: I am trying to store certain types of document fields in my own format by intercepting the indexing process. One the same line as the previous discussions on final modifiers on classes, it would be nice to be able to extend the FieldWriter, DocumentWriter etc. classes. Otis suggested

Re: Why is Field.java final?

2004-07-13 Thread John Wang
cessful when it is able to do what its creator never even dreamt of". And I think Lucene is certainly capable of that. Just my two cents. Thanks -John On Tue, 13 Jul 2004 09:12:09 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > >

Re: Why is Field.java final?

2004-07-12 Thread John Wang
Hi: On the same thought, how about the org.apache.lucene.analysis.Token class. Can we make it non-final? I sent out this question 3 different times and still got no responses... Thanks -John On Mon, 12 Jul 2004 18:33:04 -0700, Kevin A. Burton <[EMAIL PROTECTED]> wrote: > Doug Cutting wrot

Re: Why is Field.java final?

2004-07-10 Thread John Wang
I was running into the similar problems with Lucene classes being final. In my case the Token class. I sent out an email but no one responeded :( -John On Sat, 10 Jul 2004 15:50:28 -0700, Kevin A. Burton <[EMAIL PROTECTED]> wrote: > I was going to create a new IDField class which just calls super

Re: indexing help

2004-07-08 Thread John Wang
Thanks Doug. I will do just that. Just for my education, can you maybe elaborate on using the "implement an IndexReader that delivers a synthetic index" approach? Thanks in advance -John On Thu, 08 Jul 2004 10:01:59 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang

final Token

2004-07-08 Thread John Wang
Hi gurus: Please forgive some more of my ignorant questions :) The Token class is declared as final. The tokenizers and the analyzers I am writing produce token objects with more information encapsulated than the Token class defined in lucene. So it makes sense to me to be able to derive from

Re: indexing help

2004-07-08 Thread John Wang
would create a total of 11 Tokens whereas only 2 is > > neccessary. > > > >Given many documents with many terms and frequencies, it would > > create many extra Token instances. > > > > The reason I was looking to derving the Field class is because I > >

Re: indexing help

2004-07-08 Thread John Wang
tting the frequency. But > the class is final... > > Any other suggestions? > > Thanks > > -John > > On Wed, 07 Jul 2004 14:20:24 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote: > > John Wang wrote: > > > While lucene tokenizes the words in

Re: indexing help

2004-07-07 Thread John Wang
, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > > While lucene tokenizes the words in the document, it counts the > > frequency and figures out the position, we are trying to bypass this > > stage: For each document, I have a set of words with a know

indexing help

2004-07-07 Thread John Wang
Hi gurus: I am trying to be able to control the indexing process. While lucene tokenizes the words in the document, it counts the frequency and figures out the position, we are trying to bypass this stage: For each document, I have a set of words with a know frequency, e.g. java (5), l