Re: Lucille, a (new) Python port of Lucene

2007-08-28 Thread Bill Janssen
Lucille apparently doesn't require gcj. Bill > Why Lucille in light of PyLucene? > > Erik > > > On Aug 28, 2007, at 10:55 AM, Dan Callaghan wrote: > > > Dear list, > > > > I have recently begun a Python port of Lucene, named Lucille. It is > > still very much a work in progress, but I h

Re: Lucille, a (new) Python port of Lucene

2007-08-28 Thread Mike Klaas
Not to mention Lupy. Hasn't it been relatively well-established that trying to create a performant search engine in a dynamic interpreted language is a show- stopper? After several failed ports of lucene (I can add to this my own, unreleased, attempt) I just don't see the point, except as a

Re: Lucille, a (new) Python port of Lucene

2007-08-28 Thread Erik Hatcher
Why Lucille in light of PyLucene? Erik On Aug 28, 2007, at 10:55 AM, Dan Callaghan wrote: Dear list, I have recently begun a Python port of Lucene, named Lucille. It is still very much a work in progress, but I hope to have a feature-complete release compatible with Lucene 2.1 done i

Re: Indexing time linear?

2007-08-28 Thread Mike Klaas
On 23-Aug-07, at 2:48 AM, Barry Forrest wrote: Hi list, I'm trying to estimate how long it will take to index 10 million documents. If I measure how long it takes to index say 10,000 documents, can I extrapolate? Will it take roughly 1000 times longer to do the whole set? Segment mergin

indexing fields with multiplicity

2007-08-28 Thread Tim Sturge
Hi, I have fields which have high multiplicity; for example I have a topic with 1000 names, 500 of which are "USA" and 200 are "United States of America". Previously I was indexing "USA USA .(500x).. USA United States of America .(200x).. United States of America" as as single field. The pr

Re: Find "latest" document (before a certain date)

2007-08-28 Thread Karl Wettin
28 aug 2007 kl. 17.48 skrev Per Lindberg: Now, I want to search the content, and return only the LATEST found document with each id. To complicate things a bit, I want the latest before a given date. In other words, for each id pick only the one with the highest date less than x. Given you a

Find "latest" document (before a certain date)

2007-08-28 Thread Per Lindberg
Hi! I have an index containing the following fields "id" (not to be confused with the internal Lucene id) "version" "date" The combination of "id" and "version" is unique, i.e. there may be serveral versions of each document with the same id. The "date" field indicates when the version

Lucille, a (new) Python port of Lucene

2007-08-28 Thread Dan Callaghan
Dear list, I have recently begun a Python port of Lucene, named Lucille. It is still very much a work in progress, but I hope to have a feature-complete release compatible with Lucene 2.1 done in the near future. The project homepage is at: http://www.djc.id.au/lucille/ Contributions, feedback,

Re: Re: Re: Searching Diacritics

2007-08-28 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Re: Searching Diacritics

2007-08-28 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Searching Diacritics

2007-08-28 Thread anorman
This was the problem, it worked excellent! Thanks for the help! -Albert karl wettin-3 wrote: > > > 27 aug 2007 kl. 20.30 skrev anorman: > >> >> I've tried to implement an analyzer with little different then using: >> >> result = new ISOLatin1AccentFilter(result); in the TokenStream >>