Re: Indexing very large sets (10 million docs)

2003-07-30 Thread Doug Cutting
Roger Ford wrote: I do have another problem: running multi-user tests - four "users" all firing off queries one after the other - I hit this exception at the start of one run: caught a class java.io.IOException with message: Timed out waiting for [EMAIL PROTECTED]:\Lucene_Index\Index0001\comm

Re: Free, medium size, downloadable corpus of newspaper articles?

2003-07-30 Thread Peter Becker
[redirected to lucene-user] Me, too! :-) We are currently playing with the small Reuters collection (about 21.500 news items from the 80s), but I don't know if I am allowed to distribute it and it is too small anyway -- many of the implications we find are based on 1 to 3 documents. I still ha

Re: Indexing very large sets (10 million docs)

2003-07-30 Thread Roger Ford
Doug Cutting wrote: For batched indexing I recommend: (1) increasing mergeFactor somewhat, depending on how many indexed fields you have; (2) adding all of your documents; and (3) optimizing once at the end. Thanks for all the advice on this. I did as Doug suggested, and the indexing completed i

Bug: TermQuery toString - incorrect

2003-07-30 Thread Aviran Mordo
I have a TermQuery object which contains a term which has space (two words). But when I do a toString() I get a query that matches an OR operation. Example: The Term +"Small Business" results with a toString method as +(SocioEconomicInformation:Small Business) And the expected result should be

Re: All or noting hits

2003-07-30 Thread Marie-Hélène Forget
On Wed, 2003-07-30 at 15:15, Lutz Horn wrote: > Hi, > > Am Mit, 2003-07-30 um 20.04 schrieb Marie-Hélène Forget: > > I search for a word "Qvar" and I have 2 documents containing that exact > > word. I get 2 results representing the 2 documents that I want. > > Everything seems ok there, but I get

Re: All or noting hits

2003-07-30 Thread Lutz Horn
Hi, Am Mit, 2003-07-30 um 20.04 schrieb Marie-Hélène Forget: > I search for a word "Qvar" and I have 2 documents containing that exact > word. I get 2 results representing the 2 documents that I want. > Everything seems ok there, but I get other results that contains words > that starts with Q, P,

All or noting hits

2003-07-30 Thread Marie-Hélène Forget
Hi, I wonder if the behavior of my application using Lucene respects the behavior of Lucene when I perform a search. I search for a word "Qvar" and I have 2 documents containing that exact word. I get 2 results representing the 2 documents that I want. Everything seems ok there, but I get other r

RE: Multiple fields identical terms.

2003-07-30 Thread Gregor Heinrich
Hi. Thanks for your suggestion; I think the storage overhead is bearable. Actually I am doing some sort of forward indexing in addition to the inverted index. I.e., the result will be a meta-search engine that combines the Lucene IR process proper with an aspect model similar to Latent Semantic A

Re: Lucene Index on NFS Server

2003-07-30 Thread Jan Agermose
What part of the webserver are you expecting that will fail? The service or the computer? Why would the computer hosting NFS be less likely to fail than your computer hosting the webserver? You could use JMS to communicate updates to the to webservers? Or use a distributed FS on the to computers h

Java.net Lucene article

2003-07-30 Thread Erik Hatcher
A "Lucene Intro" article I recently wrote for java.net has just published: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Erik p.s. Am I an official committer now with the repository enabled for ehatcher? If so, then I'll commit this link to the resources section of the site

Lucene Index on NFS Server

2003-07-30 Thread Morus Walter
Hi, I'm currently planing a web application using lucene for search. There will be two web server maschines responable for the application and the searches. Two maschines basically to be failsafe, load is not expected to be a problem initially, though this might change over time. So scaling is a

Re: Multiple fields identical terms.

2003-07-30 Thread Erik Hatcher
On Wednesday, July 30, 2003, at 06:16 AM, Gregor Heinrich wrote: I would like to have unique term texts in my term enumeration. That is, across all fields there should be no duplicate term text. An easy solution would be to only use one field. But does someone know an alternative way with multipl

Multiple fields identical terms.

2003-07-30 Thread Gregor Heinrich
Hi everyone, my index has a title and an abstract field, both inverted and tokenized. I would like to have unique term texts in my term enumeration. That is, across all fields there should be no duplicate term text. An easy solution would be to only use one field. But does someone know an alter

Re: Different Analyzer for each Field

2003-07-30 Thread Claude Libois
thank for your answers but i found another way to solve my problem. I don't tokenize my field anymore so it doesn't pass trough the analyzer and it works. Nevertheless, I will certainly use in the future what you told me. On Monday, July 28, 2003, at 02:56 PM, Erik Hatcher wrote: On Monday, Ju

Re: Different Analyzer for each Field

2003-07-30 Thread Claude Libois
Thank you for all your answer. Gregor I will do what you told me. It's exactly what I need. Claude On Monday, July 28, 2003, at 07:09 PM, Gregor Heinrich wrote: Hi Claude, one solution is to make the tokenStream method in the Analyzer subclass listen to the field name. Example: public T