I have been working on a hierarchical search capability for a while now and
wanted to see if there was general interest in adopting some of the thinking
into Lucene.
The idea needs a little explanation so I've put some slides up here to kick
things off:
http://www.slideshare.net/MarkHarwood/pr
Hi Mark,
This is extremely cool. The user list regularly gets questions about modeling
is-a relations, and as you outline in your presentation, there currently is no
(performant) way to do it in the general case.
Here's my (non-binding) +1 for inclusion in Lucene.
Steve
On 05/07/2010 at 12
Think this is really interesting for Jackrabbit. I'd really like to
see it become part of the Lucene code base (though I am not sure
whether you where only polling Lucene devs...)
Regards Ard
On Fri, May 7, 2010 at 9:04 PM, Steven A Rowe wrote:
> Hi Mark,
>
> This is extremely cool. The user li
I've used something very similar to fold matching documents by some
field value, like author_id.
The very same issue with keeping all the parts in same segment, solved
with composite documents that go through all the pipeline and flushing
segments manually.
On Fri, May 7, 2010 at 20:25, mark harwo
(10/05/08 1:25), mark harwood wrote:
I have been working on a hierarchical search capability for a while now and
wanted to see if there was general interest in adopting some of the thinking
into Lucene.
The idea needs a little explanation so I've put some slides up here to kick
things off:
h
: I have been working on a hierarchical search capability for a while now
: and wanted to see if there was general interest in adopting some of the
: thinking into Lucene.
This looks cool ... up to slide #5 i thought you were just
proposing something akin to using FieldMaskingSpanQuery, but
N
On 2010-05-07 18:25, mark harwood wrote:
> I have been working on a hierarchical search capability for a while now and
> wanted to see if there was general interest in adopting some of the thinking
> into Lucene.
>
> The idea needs a little explanation so I've put some slides up here to kick
>
OK, seems like there is some interest.
I'll work on packaging the code/unit tests/demos and make it available.
> matching ids ... but I didn't quite catch from the slides how you encode
> the parent-child link... is it just "the next docs are sub-documents
> until the next parent doc"?
Yes - us
On 5/8/10 3:10 AM, Mark Harwood wrote:
The downside is the need to maintain sequences of related docs in the same
segment - something Lucene currently doesn't make easy with its limited control
over when segments are flushed. I suspect we'll need some discussion on how
best to support this.
There are two separate problems that I know of in indexing parts of
PDFs in an overlapping way:
1) block-structured documents of
a) the entire PDF file
b) chapters
c) sections of chapters
d.z)
2) Tracking the set of pages that each document contains.
As I understand this, LUCENE
I've put up code, example data and tests for the Nested Document feature here:
http://www.inperspective.com/lucene/LuceneNestedDocumentSupport.zip
The data used in the unit tests is chosen to illustrate practical use of
real-world content.
The final unit tests will work on more abstract data for
Very cool stuff, Mark.
Can you just open a JIRA and attach there?
On May 10, 2010, at 8:38 AM, mark harwood wrote:
> I've put up code, example data and tests for the Nested Document feature
> here: http://www.inperspective.com/lucene/LuceneNestedDocumentSupport.zip
>
> The data used in the uni
Hierachical documents is a key concept towads a unified
structured+unstructured search. It should allow us to fully implement
things such as XQuery + Full-Text
(http://www.w3.org/TR/xquery-full-text/)
Additionally it solves a century old problem: how to deal with
section/sub-sections in very large
topic...
- Original Message
From: J. Delgado
To: dev@lucene.apache.org
Sent: Mon, 10 May, 2010 16:47:50
Subject: Re: Adding another dimension to Lucene searches
Hierachical documents is a key concept towads a unified
structured+unstructured search. It should allow us to fully implement
things such as X
14 matches
Mail list logo