Hi,
I've worked on a bit on the taglib and added an index and field tag for
basic indexing capability, though I don't think it's really useful, apart
from, in my case quick prototyping of web applications. What do you guys
think? I'm new to Lucene and taglibs so I may have missed out lots of
Hi,
Here is the indexing performance testing result for the two index formats.
1000 megahertz Intel Pentium III (2 installed)
32 kilobyte primary memory cache
256 kilobyte secondary memory cache
SCSI Hard drive 145.45 GB
RAm 3G
Windows 2000 Advanced Server, Service Pack 2
JDK 140
JVM
hui wrote:
Hi,
Here is the indexing performance testing result for the two index formats.
A shameless plug: you can use Luke (http://www.getopt.org/luke) to
convert the same index between compound/non-compound formats. Which
could be useful to rule out any possible differences in the
Thank you, the converting option from Luke is really helpful for migrate
existing user index.
Regards,
Hui
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Monday, March 08, 2004 10:57 AM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as
Erik Hatcher wrote:
private static final DecimalFormat formatter =
new DecimalFormat(0); // make this as wide as you need
For ints, ten digits is probably safest. Since Lucene uses prefix
compression on the term dictionary, you don't pay a penalty at search
time for long shared
hui wrote:
Index time:
compound format is 89 seconds slower.
compound format:
1389507 total milliseconds
non-compound format:
1300534 total milliseconds
The index size is 85m with 4 fields only. The files are stored in the index.
The compound format has only 3 files and the other has 13 files.
Hi all,
could someone describe his expirience in
implementation of caching, sorting and paging search
results.
Is Stateful Session bean appropriate for this?
My wish is to obtain all search hits only in first
call, and after that, to iterate through Hit
Collection and display cached results.
I
In the RealWorld... many applications actually just re-run a search and
jump to the appropriate page within the hits searching is generally
plenty fast enough to alleviate concerns of caching.
However, if you need to cache Hits, you need to be sure to keep around
the originating
I tend to agree (but with the same uncertainty as to why I feel that way).
Regards,
Terry
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, March 08, 2004 2:34 PM
Subject: Re: Sys properties Was: java.io.tmpdir as lock
I'm looking for a way to filter out duplicate documents from an index
(either while indexing, or after the fact). It seems like there should be
an approach of comparing the terms for two documents, but I'm wondering if
any other folks (i.e. nutch) have come up with a solution to this problem.
that kind of fuzzy equality is an area of open research. you need to define what is an
acceptable error rate for Type 1 and Type 2 errors before you can think about
implementations that scale better. approaches range from identifying document
vocabulary and statistics to raw hashing of the
My impression is the new term vector support should at least make this
type of comparison feasible in some manner. I'd be interested to see
what you come up with if you give this a try. You will need the latest
CVS codebase.
Erik
On Mar 8, 2004, at 4:37 PM, Michael Giles wrote:
I'm
I have a BooleanQuery that takes 3 TermQueries
for example (title:colombo OR txt:colombo OR city:colombo)
I would like to mark hits that match in the field title in red on
display, txt in blue, and city in green. and maybe those that match in 2
fields in another color
is this possible?
thanks
On Monday 08 March 2004 12:34, Erik Hatcher wrote:
In the RealWorld... many applications actually just re-run a search and
jump to the appropriate page within the hits searching is generally
plenty fast enough to alleviate concerns of caching.
However, if you need to cache Hits, you need
I'm looking at StopFilter.java right now...
I did a kill -3 java and a number of my threads were blocked here:
ksa-task-thread-34 prio=1 tid=0xad89fbe8 nid=0x1c6e waiting for
monitor entry [b9bff000..b9bff8d0]
at java.util.Hashtable.get(Hashtable.java:332)
- waiting to lock
I don't see any reason for this to be a Hashtable.
It seems an acceptable alternative to not share analyzer/filter
instances across threads - they don't really take up much space, so is
there a reason to share them? Or I'm guessing you're sharing it
implicitly through an IndexWriter, huh?
Thanks for the tips and comments.
Regards,
Iskandar
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, March 08, 2004 7:48 PM
Subject: Re: Lucene Taglib
On Mar 8, 2004, at 3:46 AM, Iskandar Salim wrote:
I've worked on a
On Mar 8, 2004, at 10:21 PM, Iskandar Salim wrote:
Thanks for the tips and comments.
Also, there was a big smiley implicit in my JSP taglib rantings below.
Certainly no offense intended. I've paid my Struts/taglib dues and am
now deep into a completely different web development paradigm that I
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, March 09, 2004 11:51 AM
Subject: Re: Lucene Taglib
Also, there was a big smiley implicit in my JSP taglib rantings below.
Certainly no offense intended.
None taken. :)
19 matches
Mail list logo