size and nos of documents in the index

2002-03-12 Thread Parag Dharmadhikari
Hi all, How the indexing is afftected by the size of documents and what is the maximum number of documents which can be indexed. regards parag

RE: Deleting documents

2002-03-12 Thread Spencer, Dave
I think I've come across the same problem. If you have an indexer that adds docs and also deletes docs as it goes (use case: it's updating old docs or adding new ones) it seems that you always get an exception like this thrown from IndexReader.delete(). java.io.IOException: Index locked for write

Re: Maximum indexable data

2002-03-12 Thread Kelvin Tan
Ype, > > The 10,000 refers to the maximum nr. of terms per document. > It's the default, and it's not hardcoded. Simply create an indexwriter > and change this attribute before adding docs. Ahhh, my bad. I didn't notice the maxFieldLength field. Guess I'm too used to looking for getters/setters.

RE: Search result ordering question

2002-03-12 Thread Spencer, Dave
Is this question still pending? Well I haven't tried it but DateFilter might be what you're looking for: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DateF ilter.html You could also add a field that's a kind of enumerated indicating how recent the doc is. You add a field "w

RE: special character handling

2002-03-12 Thread Aruna Raghavan
Hi, I guess my question is really regarding characters like &,%, $,#,- etc. (- is used for exclusion, for eg) I remember testing and with a standard analyzer and finding that it didn't quite work. Is there any reason these charactwers won't work with a standard analyzer? The stop table for Standa

RE: special character handling

2002-03-12 Thread Otis Gospodnetic
This is answered in FAQA: http://jguru.com/faq/view.jsp?EID=538308 --- Aruna Raghavan <[EMAIL PROTECTED]> wrote: > Otis, > I am using StandardAnalyzer. > > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, March 12, 2002 3:37 PM > To: Lucene Users Li

Re: jdk 1.1.8?

2002-03-12 Thread ROSHAN NAVENDRA
I can answer question 2. .jj files are JavaCC projects. You can download and install JavaCC from the internet (just run a search fo it from Yahoo). Once you install it, simply run the command "javacc x.jj" (from javacc's bin directory) where x.jj is the .jj file you wish to unpack. Thi

RE: special character handling

2002-03-12 Thread Aruna Raghavan
Otis, I am using StandardAnalyzer. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 12, 2002 3:37 PM To: Lucene Users List Subject: Re: special character handling It depends on the Analyzer used. Otis --- Aruna Raghavan <[EMAIL PROTECTED]> wrot

Re: special character handling

2002-03-12 Thread Otis Gospodnetic
It depends on the Analyzer used. Otis --- Aruna Raghavan <[EMAIL PROTECTED]> wrote: > Hi, > Does lucene replace all special characters with spaces when it adds > the > document to the index? > Thanks! > > -- > To unsubscribe, e-mail: > > For additional commands, e-m

special character handling

2002-03-12 Thread Aruna Raghavan
Hi, Does lucene replace all special characters with spaces when it adds the document to the index? Thanks! -- To unsubscribe, e-mail: For additional commands, e-mail:

Search result ordering question

2002-03-12 Thread Kent Vilhelmsen
I've been using Lucene a bit, and find it very flexible and fast. However, I need to order search results by date (or, equally, document id); I've looked a bit into (re)writing a collect method without any luck. I'm not programming Java too much, so I'm not getting any way with the (few) hints I

jdk 1.1.8?

2002-03-12 Thread Robert A. Decker
I have three questions: 1. The jguru faq says that it's possible to use lucene with the 1.1.8 jdk. However, there are quite a few calls to StringBuffer methods that aren't present at that version of the jdk - things like delete, deleteCharAt, substring... Do people use a different version of Str

Re: Maximum indexable data

2002-03-12 Thread Ype Kingma
Kelvin, >Actually that's something which I'm not exactly thrilled about. Why is this >10,000 value hardcoded instead of configurable? Surely it's sufficient to be >a default instead of a limit. The 10,000 refers to the maximum nr. of terms per document. It's the default, and it's not hardcoded.

Re: Support for russian morphology in Lucene

2002-03-12 Thread Vadim Solonovich
Hi ! Sorry for delay in answer. Recently I have found that Russian and Ukrainian stemmer for Lucene can be implemented based on Andrew Kovalenko (http://linguist.nm.ru) stem library, which is free. Though it is not 100% pure Java solution, this library is compiled on multiple platforms and