Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Erik Hatcher
On Mar 9, 2004, at 10:23 PM, Kevin A. Burton wrote: You need do make it a HashSet: table = new HashSet( stopTable.keySet() ); Done. Also... while you're at it... the private variable name is 'table' which this HashSet certainly is *not* ;) Well, depends on your definition of 'table' I suppose

Re: Storing numbers

2004-03-10 Thread lucene
On Tuesday 09 March 2004 20:51, Timothy Stone wrote: Michael Giles wrote: Tim, Looks like you can only access it with a subscription. :( Sounds good, though. Really? I don't have a subscription. Got to it via the archives actually now that I think about it: Try Volume 7, Issue 12.

Large document collections?

2004-03-10 Thread Mark Devaney
I'm looking for information on the largest document collection that Lucene has been used to index, the biggest benchmark I've been able to find so far is 1MM documents. I'd like to generate some benchmarks for large collections (1-100MM) records and would like to know if this is feasible without

Re: Large document collections?

2004-03-10 Thread Otis Gospodnetic
I think even a 100K or 1MM doc collection will give you an idea about the retrieval time/storage requirements (which, of course, are highly dependent on what you index and how you index it). I know several people have created collections with up to 50MM docs on a single machine (not sure about

RE: Storing numbers

2004-03-10 Thread Olga Dadasheva
Try this link and scroll to top: http://www.sys-con.com/story/?storyid=37296DE=1#RES Thank you, Tim - excelent article. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 10, 2004 10:23 AM To: Lucene Users List Subject: Re: Storing numbers On

Re: Large document collections?

2004-03-10 Thread Paladin
I use several collections, one of 1 200 000 documents, one of 3 800 000 and another one of 12 000 000 documents (for the biggests) and the performances are quite good (except for search with wildcards). Our machine have 1 giga bites of memory and 2 CPU. - Original Message - From: Mark

Re: Large document collections?

2004-03-10 Thread Paladin
Well usually the time of response are 5-10 sec max, it depends of the queries (except for queries with a wildcard). i put a time out of 30 seconds for all the queries. queries with wildcard can fail because of java.lang.out.of.memories error you can try yourself on the website of my compagny (but

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Kevin A. Burton
Erik Hatcher wrote: Also... while you're at it... the private variable name is 'table' which this HashSet certainly is *not* ;) Well, depends on your definition of 'table' I suppose :) I changed it to a type-agnostic stopWords. Did you know that internally HashSet uses a HashMap? I sure

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Erik Hatcher
On Mar 10, 2004, at 2:59 PM, Kevin A. Burton wrote: I refuse to expose HashSet... sorry! :) But I did wrap what is passed in, like above, in a HashSet in my latest commit. Hm... You're doing this EVEN if the caller passes a HashSet directly?! Well it was in the ctor. But I guess I'm not seeing

1.3-final builds as 1.4-rc1-dev?

2004-03-10 Thread Jeff Wong
Hello, I noticed that Lucene 1.3-final source builds a JAR file whose version number is 1.4-rc1-dev. What does this mean? Will 1.4-final build as 1.5-rc1-dev? Just Curious, Jeff - To unsubscribe, e-mail: [EMAIL PROTECTED]

Re: 1.3-final builds as 1.4-rc1-dev?

2004-03-10 Thread Erik Hatcher
It means we screwed up the timing somehow and changed the build file version after we built the binary version, is my guess. We'll be more careful with the 1.4 release and make sure this doesn't happen then. Erik On Mar 10, 2004, at 8:34 PM, Jeff Wong wrote: Hello, I noticed that Lucene

Re: 1.3-final builds as 1.4-rc1-dev?

2004-03-10 Thread Doug Cutting
Jeff Wong wrote: I noticed that Lucene 1.3-final source builds a JAR file whose version number is 1.4-rc1-dev. What does this mean? Will 1.4-final build as 1.5-rc1-dev? Probably. If you modify the sources of a 1.3-final release, and build them, you're not building 1.3-final, but a derivative.

Re: 1.3-final builds as 1.4-rc1-dev?

2004-03-10 Thread Erik Hatcher
On Mar 10, 2004, at 9:45 PM, Doug Cutting wrote: Jeff Wong wrote: I noticed that Lucene 1.3-final source builds a JAR file whose version number is 1.4-rc1-dev. What does this mean? Will 1.4-final build as 1.5-rc1-dev? Probably. If you modify the sources of a 1.3-final release, and build them,

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Doug Cutting
Erik Hatcher wrote: Also... you're HashSet constructor has to copy values from the original HashSet into the new HashSet ... not very clean and this can just be removed by forcing the caller to use a HashSet (which they should). I've caved in and gone HashSet all the way. Did you not see my

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Kevin A. Burton
Doug Cutting wrote: Erik Hatcher wrote: Also... you're HashSet constructor has to copy values from the original HashSet into the new HashSet ... not very clean and this can just be removed by forcing the caller to use a HashSet (which they should). I've caved in and gone HashSet all the

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Erik Hatcher
On Mar 10, 2004, at 10:28 PM, Doug Cutting wrote: Erik Hatcher wrote: Also... you're HashSet constructor has to copy values from the original HashSet into the new HashSet ... not very clean and this can just be removed by forcing the caller to use a HashSet (which they should). I've caved in

incomplete word match

2004-03-10 Thread Tomcat Programmer
I have a situation where I need to be able to find incomplete word matches, for example a search for the string 'ape' would return matches for 'grapes' 'naples' 'staples' etc. I have been searching the archives of this user list and can't seem to find any example of someone doing this. At one