Lucene Taglib

2004-03-08 Thread Iskandar Salim
Hi, I've worked on a bit on the taglib and added an index and field tag for basic indexing capability, though I don't think it's really useful, apart from, in my case quick prototyping of web applications. What do you guys think? I'm new to Lucene and taglibs so I may have missed out lots of

RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread hui
Hi, Here is the indexing performance testing result for the two index formats. 1000 megahertz Intel Pentium III (2 installed) 32 kilobyte primary memory cache 256 kilobyte secondary memory cache SCSI Hard drive 145.45 GB RAm 3G Windows 2000 Advanced Server, Service Pack 2 JDK 140 JVM

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread Andrzej Bialecki
hui wrote: Hi, Here is the indexing performance testing result for the two index formats. A shameless plug: you can use Luke (http://www.getopt.org/luke) to convert the same index between compound/non-compound formats. Which could be useful to rule out any possible differences in the

RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread hui
Thank you, the converting option from Luke is really helpful for migrate existing user index. Regards, Hui -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Monday, March 08, 2004 10:57 AM To: Lucene Users List Subject: Re: Sys properties Was: java.io.tmpdir as

Re: Storing numbers

2004-03-08 Thread Doug Cutting
Erik Hatcher wrote: private static final DecimalFormat formatter = new DecimalFormat(0); // make this as wide as you need For ints, ten digits is probably safest. Since Lucene uses prefix compression on the term dictionary, you don't pay a penalty at search time for long shared

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread Doug Cutting
hui wrote: Index time: compound format is 89 seconds slower. compound format: 1389507 total milliseconds non-compound format: 1300534 total milliseconds The index size is 85m with 4 fields only. The files are stored in the index. The compound format has only 3 files and the other has 13 files.

Caching and paging search results

2004-03-08 Thread Clandes Tino
Hi all, could someone describe his expirience in implementation of caching, sorting and paging search results. Is Stateful Session bean appropriate for this? My wish is to obtain all search hits only in first call, and after that, to iterate through Hit Collection and display cached results. I

Re: Caching and paging search results

2004-03-08 Thread Erik Hatcher
In the RealWorld... many applications actually just re-run a search and jump to the appropriate page within the hits searching is generally plenty fast enough to alleviate concerns of caching. However, if you need to cache Hits, you need to be sure to keep around the originating

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread Terry Steichen
I tend to agree (but with the same uncertainty as to why I feel that way). Regards, Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, March 08, 2004 2:34 PM Subject: Re: Sys properties Was: java.io.tmpdir as lock

Filtering out duplicate documents...

2004-03-08 Thread Michael Giles
I'm looking for a way to filter out duplicate documents from an index (either while indexing, or after the fact). It seems like there should be an approach of comparing the terms for two documents, but I'm wondering if any other folks (i.e. nutch) have come up with a solution to this problem.

RE: Filtering out duplicate documents...

2004-03-08 Thread Chong, Herb
that kind of fuzzy equality is an area of open research. you need to define what is an acceptable error rate for Type 1 and Type 2 errors before you can think about implementations that scale better. approaches range from identifying document vocabulary and statistics to raw hashing of the

Re: Filtering out duplicate documents...

2004-03-08 Thread Erik Hatcher
My impression is the new term vector support should at least make this type of comparison feasible in some manner. I'd be interested to see what you come up with if you give this a try. You will need the latest CVS codebase. Erik On Mar 8, 2004, at 4:37 PM, Michael Giles wrote: I'm

which query matched in a Boolean query

2004-03-08 Thread Supun Edirisinghe
I have a BooleanQuery that takes 3 TermQueries for example (title:colombo OR txt:colombo OR city:colombo) I would like to mark hits that match in the field title in red on display, txt in blue, and city in green. and maybe those that match in 2 fields in another color is this possible? thanks

Re: Caching and paging search results

2004-03-08 Thread Tatu Saloranta
On Monday 08 March 2004 12:34, Erik Hatcher wrote: In the RealWorld... many applications actually just re-run a search and jump to the appropriate page within the hits searching is generally plenty fast enough to alleviate concerns of caching. However, if you need to cache Hits, you need

DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-08 Thread Kevin A. Burton
I'm looking at StopFilter.java right now... I did a kill -3 java and a number of my threads were blocked here: ksa-task-thread-34 prio=1 tid=0xad89fbe8 nid=0x1c6e waiting for monitor entry [b9bff000..b9bff8d0] at java.util.Hashtable.get(Hashtable.java:332) - waiting to lock

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-08 Thread Erik Hatcher
I don't see any reason for this to be a Hashtable. It seems an acceptable alternative to not share analyzer/filter instances across threads - they don't really take up much space, so is there a reason to share them? Or I'm guessing you're sharing it implicitly through an IndexWriter, huh?

Re: Lucene Taglib

2004-03-08 Thread Iskandar Salim
Thanks for the tips and comments. Regards, Iskandar - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, March 08, 2004 7:48 PM Subject: Re: Lucene Taglib On Mar 8, 2004, at 3:46 AM, Iskandar Salim wrote: I've worked on a

Re: Lucene Taglib

2004-03-08 Thread Erik Hatcher
On Mar 8, 2004, at 10:21 PM, Iskandar Salim wrote: Thanks for the tips and comments. Also, there was a big smiley implicit in my JSP taglib rantings below. Certainly no offense intended. I've paid my Struts/taglib dues and am now deep into a completely different web development paradigm that I

Re: Lucene Taglib

2004-03-08 Thread Iskandar Salim
- Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, March 09, 2004 11:51 AM Subject: Re: Lucene Taglib Also, there was a big smiley implicit in my JSP taglib rantings below. Certainly no offense intended. None taken. :)