Don't forget Clojure as well. Russell Whitaker Sent from my iPhone
> On Feb 20, 2016, at 7:44 AM, Matt Burgess <mattyb...@gmail.com> wrote: > > I have a blog post on how to do this with NiFi using a Groovy script in the > ExecuteScript (new in 0.5.0) processor using PDFBox instead of Tika: > > http://funnifi.blogspot.com/2016/02/executescript-extract-text-metadata.html?m=1 > > Jython is also supported but can't yet use Java libraries (it uses Jython > scripts/modules instead). The other languages (Groovy, Lua, JavaScript, > JRuby) can use Java libraries like Tika and PDFBox. > > Regards, > Matt > > Sent from my iPhone > >> On Feb 20, 2016, at 10:31 AM, Ralf Meier <n...@cht3.com> wrote: >> >> Hi Everybody, >> >> I’m new to Nifi and I want to find out if it is possible to extract content >> and metadata from PDF’s using a library like tika. >> My first Idea was to to use the following processors: >> - GetFile (Watch a specific Folder) >> - IdentifyMimeType (Identify if the file is a typ application/pdf) >> - RouteOnAttribute (If it is a pdf) >> - ExecuteStreamCommand: >> I changed the following settings. >> Command Arguments: {flowfilw_contents} >> Command Path: tika-python parse all >> >> I use the python tika wrapper from >> (https://github.com/chrismattmann/tika-python) >> >> But it is not working. >> Has somebody an Idea how to use tika to extract the content and the metadata >> using nifi or what I’m doing wrong. >> >> Thanks for your help. >> BR >> Ralf