Any insights on this question post break? I think my problem can be
summarised as looking for the right way to place binary data, stored as a
on-disk file into a field of an avro record

On Wed, Dec 20, 2023 at 5:06 PM Richard Beare <richard.be...@gmail.com>
wrote:

> I think I've made some progress with this, but I'm now having trouble with
> pdf files. The approach that seems to partly solve the problem is to have a
> ConvertRecord processor with a scripted reader to place the on disk (as
> delivered by the GetFile processor) into a record field. I can then use an
> UpdateRecord to add other fields. My current problem, I think, is correctly
> dealing with dumping a binary object (e.g. a pdf file) into that field.
> Going via strings worked for html files but breaks pdfs. I'm struggling
> with how to correctly set up the schema from within the script.
>
> On Tue, Dec 19, 2023 at 12:31 PM Richard Beare <richard.be...@gmail.com>
> wrote:
>
>> Hi,
>> I've gotten rusty, not having done much nifi work for a while.
>>
>> I want to run some tests of the following scenario. I have a workflow
>> that takes documents from a DB and feeds them through tika. I want to test
>> with a different document set that is currently living on disk. The tika
>> (groovy) processor that is my front end is expecting a record with a number
>> of fields, one of which is the document content.
>>
>> I can simulate the fields (badly, but that doesn't matter at this stage),
>> with generate record, but how do I get document contents from disk into the
>> right place. I've been thinking of using updaterecord to modify the random
>> records, but can't see how to get the data from GetFile into the right
>> place.
>>
>> Another thought is that perhaps I need to convert the GetFile output into
>> the right record structure with convertrecord, but then how to fill the
>> other fields.
>>
>> What am I missing here?
>>
>

Reply via email to