Hello,
is it possible (and if it is, how can I accomplish it) to configure DIH
to build up index documents by using content that resides in different
data sources?
Here is an example scenario:
Let's assume we have a table T with two columns, ID (which is the
primary key of T) and TITLE. Furthermore, each record in T is assigned a
directory containing text files that were generated out of pdf documents
by using Tika. A directory name is build by using the ID of the record
in T associated to that directory, e.g. all text files associated to a
record with id = 101 are stored in direcory 101.
Is there a way to configure DIH such that it uses ID, TITLE and the
content of all related text files when building a document (the
documents should have three fields: id, title, and text)?
Furthermore, as you may have noticed, a second question arises
naturally: Will there be any integration of Solr Cell and DIH in an
upcoming release, so that it would be possible to directly use the pdf
documents instead of the extracted text files that were generated
outside of Solr?
Best,
Sascha