Adding another dimension to Lucene searches

2010-05-07 Thread mark harwood
I have been working on a hierarchical search capability for a while now and wanted to see if there was general interest in adopting some of the thinking into Lucene. The idea needs a little explanation so I've put some slides up here to kick things off: http://www.slideshare.net/MarkHarwood/pr

RE: Adding another dimension to Lucene searches

2010-05-07 Thread Steven A Rowe
Hi Mark, This is extremely cool. The user list regularly gets questions about modeling is-a relations, and as you outline in your presentation, there currently is no (performant) way to do it in the general case. Here's my (non-binding) +1 for inclusion in Lucene. Steve On 05/07/2010 at 12

Re: Adding another dimension to Lucene searches

2010-05-07 Thread Ard Schrijvers
Think this is really interesting for Jackrabbit. I'd really like to see it become part of the Lucene code base (though I am not sure whether you where only polling Lucene devs...) Regards Ard On Fri, May 7, 2010 at 9:04 PM, Steven A Rowe wrote: > Hi Mark, > > This is extremely cool.  The user li

Re: Adding another dimension to Lucene searches

2010-05-07 Thread Earwin Burrfoot
I've used something very similar to fold matching documents by some field value, like author_id. The very same issue with keeping all the parts in same segment, solved with composite documents that go through all the pipeline and flushing segments manually. On Fri, May 7, 2010 at 20:25, mark harwo

Re: Adding another dimension to Lucene searches

2010-05-07 Thread Koji Sekiguchi
(10/05/08 1:25), mark harwood wrote: I have been working on a hierarchical search capability for a while now and wanted to see if there was general interest in adopting some of the thinking into Lucene. The idea needs a little explanation so I've put some slides up here to kick things off: h

Re: Adding another dimension to Lucene searches

2010-05-07 Thread Chris Hostetter
: I have been working on a hierarchical search capability for a while now : and wanted to see if there was general interest in adopting some of the : thinking into Lucene. This looks cool ... up to slide #5 i thought you were just proposing something akin to using FieldMaskingSpanQuery, but N

Re: Adding another dimension to Lucene searches

2010-05-08 Thread Andrzej Bialecki
On 2010-05-07 18:25, mark harwood wrote: > I have been working on a hierarchical search capability for a while now and > wanted to see if there was general interest in adopting some of the thinking > into Lucene. > > The idea needs a little explanation so I've put some slides up here to kick >

Re: Adding another dimension to Lucene searches

2010-05-08 Thread Mark Harwood
OK, seems like there is some interest. I'll work on packaging the code/unit tests/demos and make it available. > matching ids ... but I didn't quite catch from the slides how you encode > the parent-child link... is it just "the next docs are sub-documents > until the next parent doc"? Yes - us

Re: Adding another dimension to Lucene searches

2010-05-08 Thread Michael Busch
On 5/8/10 3:10 AM, Mark Harwood wrote: The downside is the need to maintain sequences of related docs in the same segment - something Lucene currently doesn't make easy with its limited control over when segments are flushed. I suspect we'll need some discussion on how best to support this.

Re: Adding another dimension to Lucene searches

2010-05-08 Thread Lance Norskog
There are two separate problems that I know of in indexing parts of PDFs in an overlapping way: 1) block-structured documents of a) the entire PDF file b) chapters c) sections of chapters d.z) 2) Tracking the set of pages that each document contains. As I understand this, LUCENE

Re: Adding another dimension to Lucene searches

2010-05-10 Thread mark harwood
I've put up code, example data and tests for the Nested Document feature here: http://www.inperspective.com/lucene/LuceneNestedDocumentSupport.zip The data used in the unit tests is chosen to illustrate practical use of real-world content. The final unit tests will work on more abstract data for

Re: Adding another dimension to Lucene searches

2010-05-10 Thread Grant Ingersoll
Very cool stuff, Mark. Can you just open a JIRA and attach there? On May 10, 2010, at 8:38 AM, mark harwood wrote: > I've put up code, example data and tests for the Nested Document feature > here: http://www.inperspective.com/lucene/LuceneNestedDocumentSupport.zip > > The data used in the uni

Re: Adding another dimension to Lucene searches

2010-05-10 Thread J. Delgado
Hierachical documents is a key concept towads a unified structured+unstructured search. It should allow us to fully implement things such as XQuery + Full-Text (http://www.w3.org/TR/xquery-full-text/) Additionally it solves a century old problem: how to deal with section/sub-sections in very large

Re: Adding another dimension to Lucene searches

2010-05-10 Thread mark harwood
topic... - Original Message From: J. Delgado To: dev@lucene.apache.org Sent: Mon, 10 May, 2010 16:47:50 Subject: Re: Adding another dimension to Lucene searches Hierachical documents is a key concept towads a unified structured+unstructured search. It should allow us to fully implement things such as X