When it starts getting complex, I usually move to SolrJ. You say
you're loading documents, so I assume Tika is in the mix too.

Here's a blog on the topic so you an see how to get started...

https://lucidworks.com/post/indexing-with-solrj/

Best,
Erick

On Wed, Sep 18, 2019 at 2:56 PM Jörn Franke <jornfra...@gmail.com> wrote:
>
> Hi,
>
> I load a set of documents. Based on these documents some logic needs to be
> applied to split them into chapters (this is done). One whole document is
> loaded as a parent. Chapters of the whole document + metadata should be
> loaded as child documents of this parent.
> I want to now collect information on how this can be done:
> * Use a custom loader - this is possible and works
> * Use DIH and extract the chapters in a ScriptTransformer and add them as
> child documents there. However, the scripttransformer receives as input
> only a HashMap and while it works to transform field values etc. It does
> not seem possible to add childdocuments within the DIH scripttransformer. I
> tried adding a JavaArray with SolrInputDocuments, but this does not seem to
> work. I see in debug/verbose mode that indeed the transformer adds them to
> the HashMap correctly, but they don't end up in the document. Maybe here it
> could be possible somehow via nested entities?
> * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument
> as a parameter and it seems feasible to extract chapters and add them as
> child documents.
>
> thank you.
>
> best regards

Reply via email to