Building documents using content residing both in database tables and text files

Sascha Szott Tue, 11 Aug 2009 07:35:59 -0700

Hello,

is it possible (and if it is, how can I accomplish it) to configure DIHto build up index documents by using content that resides in differentdata sources?


Here is an example scenario:

Let's assume we have a table T with two columns, ID (which is theprimary key of T) and TITLE. Furthermore, each record in T is assigned adirectory containing text files that were generated out of pdf documentsby using Tika. A directory name is build by using the ID of the recordin T associated to that directory, e.g. all text files associated to arecord with id = 101 are stored in direcory 101.

Is there a way to configure DIH such that it uses ID, TITLE and thecontent of all related text files when building a document (thedocuments should have three fields: id, title, and text)?

Furthermore, as you may have noticed, a second question arisesnaturally: Will there be any integration of Solr Cell and DIH in anupcoming release, so that it would be possible to directly use the pdfdocuments instead of the extracted text files that were generatedoutside of Solr?


Best,
Sascha

Building documents using content residing both in database tables and text files

Reply via email to