Any reason not to use the simplest structure - each page is one Solr document with a book field, a chapter field, and a page text field? You can then use grouping to group results by book (title text) or even chapter (title text and/or number). Maybe initially group by book and then if the user selects a book group you can re-query with the specific book and then group by chapter.
-- Jack Krupansky On Tue, Mar 1, 2016 at 8:08 AM, Zaccheo Bagnati <zacch...@gmail.com> wrote: > Original data is quite well structured: it comes in XML with chapters and > tags to mark the original page breaks on the paper version. In this way we > have the possibility to restructure it almost as we want before creating > SOLR index. > > Il giorno mar 1 mar 2016 alle ore 14:04 Jack Krupansky < > jack.krupan...@gmail.com> ha scritto: > > > To start, what is the form of your input data - is it already divided > into > > chapters and pages? Or... are you starting with raw PDF files? > > > > > > -- Jack Krupansky > > > > On Tue, Mar 1, 2016 at 6:56 AM, Zaccheo Bagnati <zacch...@gmail.com> > > wrote: > > > > > Hi all, > > > I'm searching for ideas on how to define schema and how to perform > > queries > > > in this use case: we have to index books, each book is split into > > chapters > > > and chapters are split into pages (pages represent original page > cutting > > in > > > printed version). We should show the result grouped by books and > chapters > > > (for the same book) and pages (for the same chapter). As far as I know, > > we > > > have 2 options: > > > > > > 1. index pages as SOLR documents. In this way we could theoretically > > > retrieve chapters (and books?) using grouping but > > > a. we will miss matches across two contiguous pages (page cutting > is > > > only due to typographical needs so concepts could be split... as in > > printed > > > books) > > > b. I don't know if it is possible in SOLR to group results on two > > > different levels (books and chapters) > > > > > > 2. index chapters as SOLR documents. In this case we will have the > right > > > matches but how to obtain the matching pages? (we need pages because > the > > > client can only display pages) > > > > > > we have been struggling on this problem for a lot of time and we're > not > > > able to find a suitable solution so I'm looking if someone has ideas or > > has > > > already solved a similar issue. > > > Thanks > > > > > >