I think I've made some progress with this, but I'm now having trouble with
pdf files. The approach that seems to partly solve the problem is to have a
ConvertRecord processor with a scripted reader to place the on disk (as
delivered by the GetFile processor) into a record field. I can then use an
UpdateRecord to add other fields. My current problem, I think, is correctly
dealing with dumping a binary object (e.g. a pdf file) into that field.
Going via strings worked for html files but breaks pdfs. I'm struggling
with how to correctly set up the schema from within the script.

On Tue, Dec 19, 2023 at 12:31 PM Richard Beare <richard.be...@gmail.com>
wrote:

> Hi,
> I've gotten rusty, not having done much nifi work for a while.
>
> I want to run some tests of the following scenario. I have a workflow that
> takes documents from a DB and feeds them through tika. I want to test with
> a different document set that is currently living on disk. The tika
> (groovy) processor that is my front end is expecting a record with a number
> of fields, one of which is the document content.
>
> I can simulate the fields (badly, but that doesn't matter at this stage),
> with generate record, but how do I get document contents from disk into the
> right place. I've been thinking of using updaterecord to modify the random
> records, but can't see how to get the data from GetFile into the right
> place.
>
> Another thought is that perhaps I need to convert the GetFile output into
> the right record structure with convertrecord, but then how to fill the
> other fields.
>
> What am I missing here?
>

Reply via email to