Thank you Andy, thank you again Joe. I'll rethink my approach based on your recommendations. -Jim
On Fri, Nov 3, 2017 at 1:31 PM, Andy LoPresto <alopre...@apache.org> wrote: > James, > > I am not a Python expert, so I’m glad other people could weigh in. As far > as routing on content type, I agree with Joe’s sentiment that > IdentifyMimeType and RouteOnAttribute are the correct solutions there. You > can route on a range of input options (the actual type, detected charset, > etc.). > > I would definitely avoid putting code to handle multiple disparate content > types (text vs. video, etc.) in the same ExecuteScript processor. This will > be harder to test, maintain, enhance, etc. You’ll eventually reach a Switch > Statement of Doom. Instead, approach this as each ES processor is a black > box like a Unix tool — it does one thing really well — and chain them > together. This is the philosophy NiFi is built on and you’ll have much more > success swimming with the current than fighting it. > > > Andy LoPresto > alopre...@apache.org > *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>* > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Nov 3, 2017, at 6:05 AM, Joe Witt <joe.w...@gmail.com> wrote: > > Mime type detection can be difficult business but I trust Apache Tika > to do a far better job than I ever could. The result you show for > JSON appears correct and I'd simply add that string to the list of > routing attributes that i treat as text. Or I'd key off the charset > being being provided as that would tell me enough to know it is text > or however I wanted to treat it. > > Thanks > > On Fri, Nov 3, 2017 at 8:24 AM, James McMahon <jsmcmah...@gmail.com> > wrote: > > I've always found that IdentifyMimeType returns a wide, wide range of > values > for mime.type. There is often ambiguity that mime.type is a reliable > indicator of the nature of the content. To illustrate, I've passed file.txt > into Nifi that contains a string representation of json. I'd expect this to > be handled as textual data, but mime.type gets set to > application/json;charset=UTF-8. > > Perhaps I am misusing the attribute mime.type. How have you worked around > this challenge Joe? > > On Fri, Nov 3, 2017 at 7:54 AM, Joe Witt <joe.w...@gmail.com> wrote: > > > "How can discern binary or character content using conditional checks > to be sure I handle the file properly?" > > Use NiFi and the existing processors where able and extend/script only > where necessary/critical. For the case you mention use > IdentifyMimeType and route appropriate data to the appropriate script > execution. > > Joe > > On Fri, Nov 3, 2017 at 7:04 AM, James McMahon <jsmcmah...@gmail.com> > wrote: > > Andy, regarding the the code sample you offered above - doesn't this put > into text both the attributes metadata and the payload of the flowfile? > > If that is the case, how does one modify that to read in from the stream > into variable text only the file payload? > > On Fri, Nov 3, 2017 at 5:48 AM, James McMahon <jsmcmah...@gmail.com> > wrote: > > > Thank you Andy. I'd like to ask just a few quick follow up questions. > > 1- My flow content may be textual characters, and it can also be binary > - > jpgs, pngs, and similar. How can discern binary or character content > using > conditional checks to be sure I handle the file properly? How would I > alter > this > > text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) > > to read in the data from the stream as binary data in that case? > > 2- In the case where my data in the flowfile payload is binary, do I > have > another version of this.... > > outputStream.write(bytearray(reversedText.encode('utf-8'))) > > ....that omits the encoding, like so: > > outputStream.write(bytearray(some_binary)) ? > > Thank you very much in advance. -Jim > > On Thu, Nov 2, 2017 at 8:26 PM, Andy LoPresto <alopre...@apache.org> > wrote: > > > James, > > The Python API should be the same as the Java FlowFile.java interface > [1]. Matt Burgess’ blog has a good post about using Jython to do > flowfile > content manipulation. Something like: > > flowFile = session.get() > if (flowFile != None): > flowFile = session.write(flowFile,PyStreamCallback()) > session.transfer(flowFile, REL_SUCCESS) > > With PyStreamCallback declared as a class above that block in the > script: > > import java.io > from org.apache.commons.io import IOUtils > from java.nio.charset import StandardCharsets > from org.apache.nifi.processor.io import StreamCallback > > class PyStreamCallback(StreamCallback): > def __init__(self): > pass > def process(self, inputStream, outputStream): > text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) > reversedText = text[::-1] > > outputStream.write(bytearray(reversedText.encode('utf-8'))) > > In Groovy, you can declare the StreamCallback as an inline closure to > make this more compact, but I believe in Jython it needs to be a > separate > declaration. Hope this helps. > > [1] > > https://github.com/apache/nifi/blob/master/nifi-api/src/ > main/java/org/apache/nifi/flowfile/FlowFile.java > [2] > > https://funnifi.blogspot.com/2016/03/executescript-json-to- > json-revisited_14.html > > > Andy LoPresto > alopre...@apache.org > alopresto.apa...@gmail.com > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Nov 2, 2017, at 12:53 PM, James McMahon <jsmcmah...@gmail.com> > wrote: > > In python, I can use the requests library to post content something > like > htis: > > import requests > url="https://abc.test.org" > files={'file':open('/somedir/myfile.txt','rb')} > r = requests.post(url,files=files) > > If I am in a python stream callback, how can I read the flowfile > payload > in the same way that the open() reads its file from disk? > > > > > > > >