Re: "Orphan" segment files

2004-10-03 Thread Dmitry Serebrennikov
t.org/luke/) to look into them and see what documents they contain? Good luck! Dmitry. Thanks, Ed --- Dmitry Serebrennikov <[EMAIL PROTECTED]> wrote: This is not a normal behavior, unless you are running on Windows and have searchers open for that long that are still locking the segments

Re: Too many Open Files + lucene 1.4.1 + Linux O/s

2004-10-02 Thread Dmitry Serebrennikov
Karthik N S wrote: Hi Luceners, Apologies. Other day was Trying to Search using the "Luceneweb" version with Lucene1-4-1.zip and O/s = Linux, J2SDK version "1.4.2_03-b02" With Roughly around 500 Documents (715116 kb ) Indexed using Lucene1.4-final.jar and writer.setUseCompoundFile(t

Re: "Orphan" segment files

2004-10-02 Thread Dmitry Serebrennikov
This is not a normal behavior, unless you are running on Windows and have searchers open for that long that are still locking the segments (but then they would be in deletable...). What version of Lucene are you running? At some point during the past two months there were a few days when CVS sn

Re: Proxy Con. Problem in Weblogic.

2004-09-21 Thread Dmitry Serebrennikov
This, of course, isn't the right forum for this question... Not to encourage off-topic posts, but I just happened to know at least part of the answer since we just went through the same issue. First thing to do is to make sure you are setting these properties before the first URLStreamHandler fo

Re: indexing size

2004-09-08 Thread Dmitry Serebrennikov
Niraj Alok wrote: Hi PA, Thanks for the detail ! Since we are using lucene to store the data also, I guess I would not be able to use it. By the way, I could be wrong, but I think the 35% figure you referenced in the your first e-mail actually does not include any stored fields. The deal with

Re: Spam:too many open files

2004-09-07 Thread Dmitry Serebrennikov
Hi Wallen, Actually, the files Daniel listed were modified on 8/11 and then again on 8/15. In the time between 8/11 to 8/15, I belive there could have been any number of problems, including corrupt indexes and poor multithreaded performance. However, I think after 8/15, the files should be in g

Re: multiple fields to be indexed

2004-06-17 Thread Dmitry Serebrennikov
his component of the query will be returned ahead of those that do not. Good luck! Dmitry jitender ahuja wrote: Hi, Can u pl. clarify some more how to use the BooleanQuery class as I am clueless still. As far as I can gather it deals with multiple query terms and not with searching a (may be

RE: Performance: compound vs. multi-file index, indexing and searching

2004-06-13 Thread Dmitry Serebrennikov
Hui, thanks for running the numbers and providing this to the list! The results are very interesting. Of course, the primary purpose of the compound file format is to reduce the use of filehandles (at the expense of performance, and especially during indexing). I put your data into Excel to try

Re: crash in Lucene

2003-11-06 Thread Dmitry Serebrennikov
Hi Herb, While I agree with Eric that the demo was not meant for production load, that still does not explain the NullPointerException in IndexWriter.close... I had a look at the source of the 1.2 final release and the line 146 appears to be a call to writeLock.release(). But you seem to think

Re: lucene-user Digest 15 Oct 2002 12:32:47 -0000 Issue 170

2002-10-15 Thread Dmitry Serebrennikov
> > >As results are sorted by score, you just need to look at the first to >set the score normalizer. The following code is from Hits.java: > >float scoreNorm = 1.0f; >if (length > 0 && scoreDocs[0].score > 1.0f) > scoreNorm = 1.0f / scoreDocs[0].score; > >int end = scoreDocs.leng

Re: Are score values always between 0 and 1?

2002-10-15 Thread Dmitry Serebrennikov
Ype Kingma wrote: >On Tuesday 15 October 2002 04:16, Dmitry Serebrennikov wrote: > > >>Greetings, >> >>I know that the FAQ says that they are, but in at least one instance in >>my index it appears to be equal to 1.94something. Are the scores >>guaranteed

Question: using boost for sorting

2002-10-14 Thread Dmitry Serebrennikov
Greetings Everyone, I'm thinking of trying to build something that manipulates a query score in order to achieve a sort order other then the default relevance sort. The idea is to create a new type of query: SortingQuery( Query query, String sortByField ) It would run the sub-query and return

Are score values always between 0 and 1?

2002-10-14 Thread Dmitry Serebrennikov
Greetings, I know that the FAQ says that they are, but in at least one instance in my index it appears to be equal to 1.94something. Are the scores guaranteed to be between 0 and 1, and if not, what would it take to make them such? Thanks. Dmitry. -- To unsubscribe, e-mail:

RE: batch indexing

2002-08-08 Thread Dmitry Serebrennikov
I was just thinking about doing something similar, but after looking at your code I thought couldn't the same thing be done by manipulating the mergeFactor on the existing IndexWriter? It already indexes n documents into memory before writing a new disk segment. I just looked at it again but I

List admin request

2002-05-24 Thread Dmitry Serebrennikov
Could someone please change the Reply-To header on the digest messages from the lucene-user list? Right now it goes back to the lucene-user-digest address which bounces back. It's no big deal but it bites me from time to time and I imaging a few other people as well... Thanks. Dmitry. PS: I'm

Re: Small indexes

2002-05-24 Thread Dmitry Serebrennikov
[EMAIL PROTECTED] wrote: >> >> >>Subject: >> >>Small indexes >>From: >> >>"David Elworthy" <[EMAIL PROTECTED]> >>Date: >> >>Thu, 23 May 2002 17:04:58 -0400 >>To: >> >><[EMAIL PROTECTED]> >> >> >>Are there are known problems w

Re: lucene-user Digest 24 May 2002 14:31:22 -0000 Issue 105

2002-05-24 Thread Dmitry Serebrennikov
> > > > Subject: > > powerpoint: sometimes it works...sometimes it doesn't > From: > > Bruce Altner <[EMAIL PROTECTED]> > Date: > > Wed, 22 May 2002 20:13:46 -0400 > To: > > [EMAIL PROTECTED] > > > Greetings: > > I am brand n

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-05-01 Thread Dmitry Serebrennikov
Subject: Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException) From: petite_abeille <[EMAIL PROTECTED]> Date: Wed, 1 May 2002 08:37:51 +0200 To: "Lucene Users List" <[EMAIL PROTECTED]> On Wednesday, May 1, 2002, at 12:41 AM, Dmitry Serebrennik

Re: FileNotFoundException: Too many open files

2002-05-01 Thread Dmitry Serebrennikov
PA, > On average, there seem to be less than one hundred Lucene files per index. You are probably past this point by now, but since I didn't see anyone pick up on this, I wanted to respond. "Less then a hundred" is definetely too many files for a Lucene index, unless you have a very large nu

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-04-30 Thread Dmitry Serebrennikov
Just a couple of clarification points: - the number of files that Lucene uses depends on the number of segments in the index and the number of *stored* fields - if your fields are not stored but only indexed, they do not require separate files. Otherwise, an .fnn file is created for each field.

Re: Getting the terms that matched the HitDoc? & Relevance Feedback

2002-03-31 Thread Dmitry Serebrennikov
> > >Subject: > >RE: Relevance Feedback >From: > >Doug Cutting <[EMAIL PROTECTED]> >Date: > >Sat, 30 Mar 2002 08:51:39 -0800 >To: > >Lucene Users List <[EMAIL PROTECTED]> > > >Dmitry Serebrennikov [[EMAIL PROTECTED]] has implemented a

Re: Retrieving Field info from an index

2002-03-21 Thread Dmitry Serebrennikov
> > >>>Lex Lawrence wrote: >>> You miss my point. The value of an "unstored" Field is not stored in the index, however it's name most certainly is. That's what I'm interested in. What I'd like to know if there is a way to get the names of all searchable Fields in an index. >

RE: Question Deleting/Reindexing Files

2002-03-21 Thread Dmitry Serebrennikov
> > >>[1] There's no update so delete and then add is what you want. >>[2] I have had the same problems w/ using an IndexWriter and IndexReader >>at the same time and getting a locking problem when deleting. I think I >>sent >>mail to the list w/ a test case a week ago [disclaimer: this is not >>

Re: Indexing and Duplication

2002-03-21 Thread Dmitry Serebrennikov
> > >This seems like a silly question, but will keeping hold of Document objects >cause me to run into "Too many files open" problems? If each document object >has a Field.Text which contains a Reader, and the Reader isn't closed till >the document is indexed, would this be an issue? Is the memory

Re: many analyzers, same index.

2001-10-20 Thread Dmitry Serebrennikov
> The issue is that the set of features for queries on different types of > contextual units (used to define Lucene documents) will be different. > An example is that our XML and text documents need fuzzy-matching and porter > stemming capabilities and on others (created and maintained from metad