Hello,

is it possible (and if it is, how can I accomplish it) to configure DIH to build up index documents by using content that resides in different data sources?

Here is an example scenario:
Let's assume we have a table T with two columns, ID (which is the primary key of T) and TITLE. Furthermore, each record in T is assigned a directory containing text files that were generated out of pdf documents by using Tika. A directory name is build by using the ID of the record in T associated to that directory, e.g. all text files associated to a record with id = 101 are stored in direcory 101.

Is there a way to configure DIH such that it uses ID, TITLE and the content of all related text files when building a document (the documents should have three fields: id, title, and text)?

Furthermore, as you may have noticed, a second question arises naturally: Will there be any integration of Solr Cell and DIH in an upcoming release, so that it would be possible to directly use the pdf documents instead of the extracted text files that were generated outside of Solr?

Best,
Sascha

Reply via email to