Any insights on this question post break? I think my problem can be summarised as looking for the right way to place binary data, stored as a on-disk file into a field of an avro record
On Wed, Dec 20, 2023 at 5:06 PM Richard Beare <richard.be...@gmail.com> wrote: > I think I've made some progress with this, but I'm now having trouble with > pdf files. The approach that seems to partly solve the problem is to have a > ConvertRecord processor with a scripted reader to place the on disk (as > delivered by the GetFile processor) into a record field. I can then use an > UpdateRecord to add other fields. My current problem, I think, is correctly > dealing with dumping a binary object (e.g. a pdf file) into that field. > Going via strings worked for html files but breaks pdfs. I'm struggling > with how to correctly set up the schema from within the script. > > On Tue, Dec 19, 2023 at 12:31 PM Richard Beare <richard.be...@gmail.com> > wrote: > >> Hi, >> I've gotten rusty, not having done much nifi work for a while. >> >> I want to run some tests of the following scenario. I have a workflow >> that takes documents from a DB and feeds them through tika. I want to test >> with a different document set that is currently living on disk. The tika >> (groovy) processor that is my front end is expecting a record with a number >> of fields, one of which is the document content. >> >> I can simulate the fields (badly, but that doesn't matter at this stage), >> with generate record, but how do I get document contents from disk into the >> right place. I've been thinking of using updaterecord to modify the random >> records, but can't see how to get the data from GetFile into the right >> place. >> >> Another thought is that perhaps I need to convert the GetFile output into >> the right record structure with convertrecord, but then how to fill the >> other fields. >> >> What am I missing here? >> >