Re: Multi-field distinct query

2007-05-16 Thread Paul Elschot
Terry, On Wednesday 16 May 2007 01:13, dontspamterry wrote: > > ... I played around with caching BitSets for the fields > which I'd like to do a distinct on, but given the amount of data, I run out > of memory. I don't know whether your final solution will require filtering, but if it does, deco

Re: Multi-field distinct query

2007-05-16 Thread Steven Rowe
Hi Terry, Why not have another index in which a document has one field for the parent and another field containing all of its children. An OR query over the "children" field would return you exactly what you want - one document for each distinct parent. Steve dontspamterry wrote: > Hi all, > >

Missing Searcher.search() signature

2007-05-16 Thread Timo Nentwig
Hello everybody, I'm about to understand some of Lucene's internals in order to solve a problem which I think I'd be able to solve by implementing my own HitCollector in conjunction with Lucene's FieldSortedHitQueue (for sorting). However, existing Searcher.search() signatures only accept eithe

Re: Multi-field distinct query

2007-05-16 Thread dontspamterry
Hi Steve, We originally had documents which were term-centric, i.e. what you described - document for the parent and all of its children. We changed it to model a single, parent-child relation as one document due to requirements and the fact that we were having memory issues for cases where a par

Re: Multi-field distinct query

2007-05-16 Thread dontspamterry
Thanks for the tip. I've re-posted on Lucene-Java Users. -Terry Grant Ingersoll-6 wrote: > > I suggest you ask on the user mailing list (java- > [EMAIL PROTECTED]) as you are likely to get a lot more interest > from others. java-dev is for discussing the internals of how Lucene > works.

Re: Missing Searcher.search() signature

2007-05-16 Thread Chris Hostetter
: However, existing Searcher.search() signatures only accept either HitCollector or : Sort, but not both. : : Is there some special reason why this signature doesn't exist or is it just because every : possible permutation would be too much (stricly speaking I miss the nDocs argument as well :)?

Tests, Contribs, and Releases

2007-05-16 Thread Chris Hostetter
Hey everybody, this thread has been sitting in my inbox for a while waiting for me to have a few minutes to look into it... http://www.nabble.com/Packaging-Lucene-2.1.0-for-Debian--found-2-junit-errors-tf3571676.html In a nutshell, when a guy from Debian went looking to package Lucene he noticed

Re: Tests, Contribs, and Releases

2007-05-16 Thread Paul Smith
Does Lucene have a gump run descriptor? That's quite useful for tracking this sort of thing too. It's very good at nagging! :) The standard maven assembly packaging runs the unit tests by default too. Changing the lucene build system to maven is not something you'd want to jump at withou

Recreating a document from its index

2007-05-16 Thread Stefano Fornari
Hi All, I have a question to which I could not answer reading the documentation and searching the mailing list archive: is it possible to recreate a document (or a good approximation of it) from its index? If I well understood the doc, the index stores each term, and per each term the positions w

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

2007-05-16 Thread Sean O'Connor (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496434 ] Sean O'Connor commented on LUCENE-794: -- Mark, Can you point me in the right direction? I want to find ALL hit

Re: Tests, Contribs, and Releases

2007-05-16 Thread Grant Ingersoll
Yeah, I hate to admit it and start another round of Maven vs. ANT, but Maven does take care of darn near all these issues. :-) To share my experience, we are actually in the process of upgrading from Maven 1 to Maven 2. The docs are good for the basics (i.e. 80-90% of what you need), but

Re: Tests, Contribs, and Releases

2007-05-16 Thread Paul Smith
To answer your question, though, I don't see any reason not to make the changes to make the current process more repeatable. Yeah, mod'ing the ant process now is going to be simpler to catch the current problem. Still, I'd check the Gump stuff for Lucene, because I'd be surprised that wo