Limo 0.5

2004-11-22 Thread Chandrashekhar
Hi, With Limo 0.5 , can i find out if certain word from some Document is indexed or not? With Regards, Chandrashekhar V Deshmukh

Re: Using multiple analysers within a query

2004-11-22 Thread Paul Elschot
On Monday 22 November 2004 05:02, Kauler, Leto S wrote: Hi Lucene list, We have the need for analysed and 'not analysed/not tokenised' clauses within one query. Imagine an unparsed query like: +title:Hello World +path:Resources\Live\1 In the above example we would want the first clause

Re: Using multiple analysers within a query

2004-11-22 Thread Morus Walter
Kauler, Leto S writes: Would anyone have any suggestions on how this could be done? I was thinking maybe the QueryParser would have to be changed/extended to accept a separator other than colon :, something like = for example to indicate this clause is not to be tokenised. I suggested

Re: disadvantages

2004-11-22 Thread Erik Hatcher
On Nov 22, 2004, at 12:36 AM, Luke Francl wrote: Well that really depends on how big your index is and what they search for, now doesn't it? ;) Everything is relative. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Sun 11/21/2004 2:52 PM To: Lucene Users List

Re: Using multiple analysers within a query

2004-11-22 Thread Erik Hatcher
On Nov 22, 2004, at 2:56 AM, Morus Walter wrote: Kauler, Leto S writes: Would anyone have any suggestions on how this could be done? I was thinking maybe the QueryParser would have to be changed/extended to accept a separator other than colon :, something like = for example to indicate this

Re: Using multiple analysers within a query

2004-11-22 Thread Morus Walter
Erik Hatcher writes: If your query isn't entered by users, you shouldn't use query parser in most cases anyway. I'd go even further and say in all cases. If you use lucene as a search server you have to provide the query somehow. E.g. we have an php application, that sends queries to a

Question about multi-searching [re-post]

2004-11-22 Thread Cocula Remi
Hi, (First of all : what is the plurial of index in english ; indexes or indices ?) I want to search into several indexes (indices ?). For that, I parse a new query using QueryParser or MultiFieldQueryParser. Then I search my indexes using the MultiSearcher class. Ok, but the

Re: How much time indexing doc ??

2004-11-22 Thread Luke Shannon
PDF(s) can definitely slow things down, depending on their size. If there are a few larger PDF documents that time is definitely possible. Luke - Original Message - From: Miguel Angel [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, November 20, 2004 11:25 AM Subject: How much

Re: Optimized??

2004-11-22 Thread Luke Shannon
As I understand it optimization is when you merge several segments into one allowing for faster queries. The FAQs and API have further details. http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexingtoc=faq#q24 Luke - Original Message - From: Miguel Angel [EMAIL

Re: Question about multi-searching [re-post]

2004-11-22 Thread Erik Hatcher
On Nov 22, 2004, at 9:18 AM, Cocula Remi wrote: (First of all : what is the plurial of index in english ; indexes or indices ?) We used indexes in Lucene in Action. Its a bit ambiguous in English, but indexes sounds less formal and is acceptable. For that, I parse a new query using QueryParser

Too many open files issue

2004-11-22 Thread Neelam Bhatnagar
Hi, I had requested help on an issue we have been facing with the Too many open files Exception garbling the search indexes and crashing the search on the web site. As a suggestion, you had asked us to look at the articles on O'Reilly Network which had specific context around this exact

Re: Using multiple analysers within a query

2004-11-22 Thread Erik Hatcher
On Nov 22, 2004, at 9:17 AM, Morus Walter wrote: Erik Hatcher writes: If your query isn't entered by users, you shouldn't use query parser in most cases anyway. I'd go even further and say in all cases. If you use lucene as a search server you have to provide the query somehow. E.g. we have an

Re: Limo 0.5

2004-11-22 Thread Luke Francl
On Mon, 2004-11-22 at 02:27, Chandrashekhar wrote: Hi, With Limo 0.5 , can i find out if certain word from some Document is indexed or not? This feature doesn't exist as such. You could search for it and if results come up, then the word is in the documents it returns. I'll add

RE: Question about multi-searching [re-post]

2004-11-22 Thread Chuck Williams
If you are going to compare scores across multiple indices, I'd suggest considering one of the patches here: http://issues.apache.org/bugzilla/show_bug.cgi?id=31841 Chuck -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, November 22, 2004 6:30 AM

Index in RAM - is it realy worthy?

2004-11-22 Thread iouli . golovatyi
I did following test: I created the RAM folder on my Red Hat box and copied c. 1Gb of indexes there. I expected the queries to run much quicker. In reality it was even sometimes slower(sic!) Lucene has it's own RAM disk functionality. If I implement it, would it bring any benefits? Thanks in

Re: Index in RAM - is it realy worthy?

2004-11-22 Thread Otis Gospodnetic
For the Lucene book I wrote some test cases that compare FSDirectory and RAMDirectory. What I found was that with certain settings FSDirectory was almost as fast as RAMDirectory. Personally, I would push FSDirectory and hope that the OS and the Filesystem do their share of work and caching for

Re: Need help with filtering

2004-11-22 Thread Edwin Tang
Hello again, I've modified DateFilter to filter out document IDs as suggested. All seems to be running well until I tried a specific test case. All my documents have IDs in the 400,000 range. If I set my lower limit to 5, nothing comes back. After examining the code, I found the issue to be at

RE: Need help with filtering

2004-11-22 Thread Chuck Williams
It sounds like you need to pad your numbers with leading zeroes, i.e. use the same type of encoding as is required by RangeQuery's. If you query with 05 instead of 5 do you get what you expect? If all your document id's are fixed length, then string comparison will be isomorphic to integer

auto-generate uid?

2004-11-22 Thread aurora
Is there a way to auto-generate uid in Lucene? Even it is just a way to query the highest uid and let the application add one to it will do. Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

indexing benchmark

2004-11-22 Thread John Wang
Hi folks: Is there an indexing benchmark somewhere? I see a search benchmark on the lucene home site. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: auto-generate uid?

2004-11-22 Thread Erik Hatcher
What would the purpose of an auto-generated UID be? But no, Lucene does not generate UID's for you. Documents are numbered internally by their insertion order. This number changes, however, when documents are deleted in the middle and the index is optimized. Erik On Nov 22, 2004, at

RE: Too many open files issue

2004-11-22 Thread Will Allen
If you are on linux the number of file handles for a session is much lower than that for the whole machine. ulimit -n will tell you. There are instructions on the web for changing this setting, it involves the /etc/security/limits.conf and setting the values for nofile. (bulkadm is my user)

downloading Lucene 1.4.2

2004-11-22 Thread Sullivan, Sean C - MWT
According to the Lucene homepage, Lucene 1.4.2 was released on October 1, 2004 However, the dist on www.apache.org does not have a copy of Lucene 1.4.2 http://www.apache.org/dist/jakarta/lucene/binaries/ Where can I download Lucene 1.4.2? -Sean

Re: downloading Lucene 1.4.2

2004-11-22 Thread Hoss
In the same Lucene News section where the announcement about 1.4.2 is listed, there is a link that says Binary and source distributions are available here. ... http://cvs.apache.org/dist/jakarta/lucene/v1.4.2/ I got really confused yesterday after I already had the binary version and i was

Re: Index in RAM - is it realy worthy?

2004-11-22 Thread John Wang
In my test, I have 12900 documents. Each document is small, a few discreet fields (KeyWord type) and 1 Text field containing only 1 sentence. with both mergeFactor and maxMergeDocs being 1000 using RamDirectory, the indexing job took about 9.2 seconds not using RamDirectory, the indexing job

Re: Index in RAM - is it realy worthy?

2004-11-22 Thread Kevin A. Burton
Otis Gospodnetic wrote: For the Lucene book I wrote some test cases that compare FSDirectory and RAMDirectory. What I found was that with certain settings FSDirectory was almost as fast as RAMDirectory. Personally, I would push FSDirectory and hope that the OS and the Filesystem do their share

Re: Index in RAM - is it realy worthy?

2004-11-22 Thread Kevin A. Burton
Otis Gospodnetic wrote: For the Lucene book I wrote some test cases that compare FSDirectory and RAMDirectory. What I found was that with certain settings FSDirectory was almost as fast as RAMDirectory. Personally, I would push FSDirectory and hope that the OS and the Filesystem do their share

Re: downloading Lucene 1.4.2

2004-11-22 Thread Erik Hatcher
Click the here link on the Lucene home page. We Lucene committers have been very very lame and have not published the binary distribution appropriately for the mirrors to pick up. One of these days we'll correct this, but for now you can click the link from the announcement on the home page.

Re: auto-generate uid?

2004-11-22 Thread aurora
Just to clarify. I have a Field 'uid' those value is an unique integer. I use it as a key to the document stored externally. I don't mean Lucene's internal document number. I was wonder if there is a method to query the highest value of a field, perhaps something like:

Re: auto-generate uid?

2004-11-22 Thread Erik Hatcher
On Nov 22, 2004, at 4:39 PM, aurora wrote: Just to clarify. I have a Field 'uid' those value is an unique integer. I use it as a key to the document stored externally. I don't mean Lucene's internal document number. I was wonder if there is a method to query the highest value of a field,

Re: auto-generate uid?

2004-11-22 Thread Bernhard Messer
Just to clarify. I have a Field 'uid' those value is an unique integer. I use it as a key to the document stored externally. I don't mean Lucene's internal document number. I was wonder if there is a method to query the highest value of a field, perhaps something like:

Re: auto-generate uid?

2004-11-22 Thread Terry Steichen
Not exactly sure what you're trying to do. You can easily generate a number when you index each Document and insert it in a uid field (which is, BTW, what I do), and if you base it on a timestamp plus some characteristic of the document (which is also what I do), it should always be unique.

Re: Too many open files issue

2004-11-22 Thread Dmitry
I'm sorry, I wasn't involved in the original conversation but maybe I can jump in with some info that will help. The number of files depends on the merge factor, number of segments, and number of indexed fields in your index. It also depends on whether you are using compound files or not (this

Re: Too many open files issue

2004-11-22 Thread Chris Lamprecht
A useful resource for increasing the number of file handles on various operating systems is the Volano Report: http://www.volano.com/report/ I had requested help on an issue we have been facing with the Too many open files Exception garbling the search indexes and crashing the search on the

JDBCDirectory to prevent optimize()?

2004-11-22 Thread Kevin A. Burton
It seems that when compared to other datastores that Lucene starts to fall down. For example lucene doesn't perform online index optimizations so if you add 10 documents you have to run optimize() again and this isn't exactly a fast operation. I'm wondering about the potential for a generic

multi-dimensional scaling

2004-11-22 Thread DES
Is it possible to combine Lucene and multi-dimensional scaling in some way?

Numeric Range Restrictions: Queries vs Filters

2004-11-22 Thread Hoss
(NOTE: numbers in [] indicate Footnotes) I'm rather new to Lucene (and this list), so if I'm grossly misunderstanding things, forgive me. One of my main needs as I investigate Search technologies is to restrict results based on Ranges of numeric values. Looking over the archives of this list,

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-22 Thread Hoss
Of course, not only did I manage to forget to include the attachment, but when I sent a reply with the code, mail.apache.org rejected it because it was a ZIP file. So let's see how mail.apache.or feels about 6 seperate text files. : Date: Mon, 22 Nov 2004 18:25:24 -0800 (PST) : Subject: