Re: Too many open files issue

2004-11-22 Thread Dmitry
l will help you see what files are open and you can validate that all of the really need to be open. Best of luck. Dmitry. Neelam Bhatnagar wrote: Hi, I had requested help on an issue we have been facing with the "Too many open files" Exception garbling the search indexes and crashing the

Re: "Orphan" segment files

2004-10-03 Thread Dmitry Serebrennikov
t.org/luke/) to look into them and see what documents they contain? Good luck! Dmitry. Thanks, Ed --- Dmitry Serebrennikov <[EMAIL PROTECTED]> wrote: This is not a normal behavior, unless you are running on Windows and have searchers open for that long that are still locking the segments

Re: Too many Open Files + lucene 1.4.1 + Linux O/s

2004-10-02 Thread Dmitry Serebrennikov
ay be the answer. - look into "lsof" utility. It can display all file handles in use by a given process. This is a good tool to troubleshoot "too many open files" issues. Good luck. Dmitry. - To unsubscribe, e-

Re: "Orphan" segment files

2004-10-02 Thread Dmitry Serebrennikov
snapshot would have had this problem. If you are running from CVS, try the latest release and see if this occurs again. Dmitry. Edwin Tang wrote: Hello, I'm seeing in my index directory some segment files that are not included in the segments or deletable files. These segment files show

Re: Proxy Con. Problem in Weblogic.

2004-09-21 Thread Dmitry Serebrennikov
of Sun's stack. Sun's handler is called sun.net.www.protocol.http.Handler. Hope this helps. Good luck! Dmitry. Natarajan.T wrote: Hi FYI, I am doing web crawling in my application using proxy setting. like the below code.. Properties systemSettings = System.getProperties(); systemSettings.put("http

Re: indexing size

2004-09-08 Thread Dmitry Serebrennikov
35% was, I think, to illustrate that index data structures used for searching by Lucene are efficient. But Lucene does nothing special about stored content - no compression or anything like that. So you end up with the pure size of your data plus the 35% of the indexed data. Cheers. Dmitry

Re: Spam:too many open files

2004-09-07 Thread Dmitry Serebrennikov
good working order. If you are not sure if you saw problems with pre-8/15 or post-8/15 version of the code, is it possible for you to try the latest CVS and see if the problem exists now? If it does, it will of course require urgent attention. Thanks very much! Dmitry. Daniel Naber wrote: On

Offer of services

2004-07-18 Thread Dmitry
TED] References and the resume are available upon request. Payment hourly, or as a fixed bid. May the source be with you! :) Thanks very much, and best wishes to everyone. Dmitry Serebrennikov - To unsubscribe, e-mail: [E

Re: multiple fields to be indexed

2004-06-17 Thread Dmitry Serebrennikov
o match. If it is neither (no prefix in the query parser), it is not required to match for the query to match, provided some other component of the query does. This last one may seem useless, except that if this query component does match, the score will be boosted. So documents that do match t

RE: Performance: compound vs. multi-file index, indexing and searching

2004-06-13 Thread Dmitry Serebrennikov
script I'm including will parse that kind of data and produce a comma-separated output of GC stats that can be graphed more easily. 1458110 Hope you guys find the above useful. Good luck! Dmitry. #!/usr/local/bin/python import sys text = open(sys.argv[1], "r&quo

Re: crash in Lucene

2003-11-06 Thread Dmitry Serebrennikov
close() was called twice on the same IndexWriter. Perhaps the demo has a bug that ends up doing this in some cases? Dmitry. Subject: RE: crash in Lucene From: "Chong, Herb" <[EMAIL PROTECTED]> Date: Tue, 4 Nov 2003 16:04:38 -0500 To: "Lucene Users List" <[EMAIL PROTECTE

Re: lucene-user Digest 15 Oct 2002 12:32:47 -0000 Issue 170

2002-10-15 Thread Dmitry Serebrennikov
MultiSearcher). There, hits arrive in order in which they are found, which is the insertion order. So I don't know when a hit with the highest score will come about. Dmitry. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Are score values always between 0 and 1?

2002-10-15 Thread Dmitry Serebrennikov
Ype Kingma wrote: >On Tuesday 15 October 2002 04:16, Dmitry Serebrennikov wrote: > > >>Greetings, >> >>I know that the FAQ says that they are, but in at least one instance in >>my index it appears to be equal to 1.94something. Are the scores >>guaranteed

Question: using boost for sorting

2002-10-14 Thread Dmitry Serebrennikov
f it would, would then I have to do something during the indexing time to set normalization / scoring factors for that field to something or other? Thanks. Dmitry. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Are score values always between 0 and 1?

2002-10-14 Thread Dmitry Serebrennikov
Greetings, I know that the FAQ says that they are, but in at least one instance in my index it appears to be equal to 1.94something. Are the scores guaranteed to be between 0 and 1, and if not, what would it take to make them such? Thanks. Dmitry. -- To unsubscribe, e-mail: <mai

RE: batch indexing

2002-08-08 Thread Dmitry Serebrennikov
but I can't see without a detailed study whether the mergeFactor applies to merging from RAM to disk only or for merging on-disk segments as well. If it applies to both, perhaps we could add a different field to the IndexWriter to allow the two values to be different? Am I missing somethin

List admin request

2002-05-24 Thread Dmitry Serebrennikov
Could someone please change the Reply-To header on the digest messages from the lucene-user list? Right now it goes back to the lucene-user-digest address which bounces back. It's no big deal but it bites me from time to time and I imaging a few other people as well... Thanks. Dmitry

Re: Small indexes

2002-05-24 Thread Dmitry Serebrennikov
m. >> >>It's not a big deal, as my actual document collection is not this small. I'm just >curious. >> >>-- David Elworthy >> >There is no known problem, but there is buffering where 10 documents are >indexed into memory and then are flushed to disk. The

Re: lucene-user Digest 24 May 2002 14:31:22 -0000 Issue 105

2002-05-24 Thread Dmitry Serebrennikov
without knowing more about the PPT file format. If you can find a program or library that will extract text from a PPT file, you then should be able to easily use Lucene to index this text. This might not be as elegant as the final solution of the project I mentioned above, but this is the w

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-05-01 Thread Dmitry Serebrennikov
Subject: Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException) From: petite_abeille <[EMAIL PROTECTED]> Date: Wed, 1 May 2002 08:37:51 +0200 To: "Lucene Users List" <[EMAIL PROTECTED]> On Wednesday, May 1, 2002, at 12:41 AM, Dmitry Serebrennik

Re: FileNotFoundException: Too many open files

2002-05-01 Thread Dmitry Serebrennikov
le memory in a particular NT kernel memory pool (not just the free memory on the system). The pool size can be controlled probably, but I've found that it is usually generous enough - more so than the Solaris settings. If BSD is like NT in this regard (at least to some degree), the number o

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-04-30 Thread Dmitry Serebrennikov
IndexWriter called infoStream. If this is set to a PrintStream (such as System.out), various diagnostic messages about the merging process will be printed to that stream. You might find this helpful in tuning the merge parameters. Hope this helps. Good luck. Dmitry. -- To unsubscribe, e-mail

Re: Getting the terms that matched the HitDoc? & Relevance Feedback

2002-03-31 Thread Dmitry Serebrennikov
> > >Subject: > >RE: Relevance Feedback >From: > >Doug Cutting <[EMAIL PROTECTED]> >Date: > >Sat, 30 Mar 2002 08:51:39 -0800 >To: > >Lucene Users List <[EMAIL PROTECTED]> > > >Dmitry Serebrennikov [[EMAIL PROTECTED]] has implemented a

Re: Retrieving Field info from an index

2002-03-21 Thread Dmitry Serebrennikov
> > >>>Lex Lawrence wrote: >>> You miss my point. The value of an "unstored" Field is not stored in the index, however it's name most certainly is. That's what I'm interested in. What I'd like to know if there is a way to get the names of all searchable Fields in an index. >

RE: Question Deleting/Reindexing Files

2002-03-21 Thread Dmitry Serebrennikov
> > >>[1] There's no update so delete and then add is what you want. >>[2] I have had the same problems w/ using an IndexWriter and IndexReader >>at the same time and getting a locking problem when deleting. I think I >>sent >>mail to the list w/ a test case a week ago [disclaimer: this is not >>

Re: Indexing and Duplication

2002-03-21 Thread Dmitry Serebrennikov
the other hand, if it is just a String or a StringReader it would consume memory equal (probably greater) to the size of the data. One way to fix this is to create your own Reader class, say DelayedReader, which does not open a file upon creation, but only upon the first read. That would help sa

Re: many analyzers, same index.

2001-10-20 Thread Dmitry Serebrennikov
> The issue is that the set of features for queries on different types of > contextual units (used to define Lucene documents) will be different. > An example is that our XML and text documents need fuzzy-matching and porter > stemming capabilities and on others (created and maintained from metad