Yup, Joe beat me too it, but was going to suggest those options... In the second case, you would probably use SplitText to get each line of the CSV as a FlowFile, then ExtractText to pull out every value of the line into attributes, then ReplaceText would construct a JSON document using expression language to access the attributes from ExtractText.
On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <[email protected]> wrote: > Adam, Bryan, > > Could do the CSV to Avro processor and then follow it with the Avro to > JSON processor. Alternatively, could use ExtractText to pull the > fields as attributes and then use ReplaceText to produce a JSON > output. > > Thanks > Joe > > On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams > <[email protected]> wrote: > > Bryan, > > > > Thanks for the feedback. I stripped the ExtractText and tried routing > all > > unmatched traffic to Mongo as well, hence the CSV import problems. Off > the > > top of my head i do not think MongoDB allows CSV inserts through the java > > client, we've always had to work with the JSON/document model for it. > For a > > CSV format, it would have to be similar to this idea: > > > https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java > > > > So looking at the other processors in NiFi, is there a way then to move > from > > a CSV format to JSON before putting to Mongo? > > > > ________________________________ > > Date: Mon, 21 Sep 2015 16:09:10 -0400 > > > > Subject: Re: CSV to Mongo > > From: [email protected] > > To: [email protected] > > > > Adam, > > > > I was able import the full template, thanks. A couple of things... > > > > The ExtractText processor works by adding user-defined properties (the + > > icon in the top-right of the properties window) where the property name > is a > > destination attribute and the value is a regular expression. > > Right now there weren't any regular expressions defined so that processor > > will always route the file to 'unmatched'. Generally you would probably > want > > to route the matched files to the next processor, and then auto-terminate > > the unmatched relationship (assuming you want to filter out non-matches). > > > > Do you know if MongoDB supports inserting a CSV file through their Java > > client? do you have similar code that already does this in Storm? > > > > I am honestly not that familiar with MongoDB, but in the PutMongo > processor > > it takes the incoming data and calls: > > Document doc = Document.parse(new String(content, charset)); > > > > Looking at that Document.parse() method, it looks like it expects a JSON > > document, so I just want to make sure that we expect CSV insertions to > work > > here. > > In researching this, it looks Mongo has some kind of bulkimport utility > that > > handles CSV [1], but this is a command line utility. > > > > -Bryan > > > > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/ > > > > > > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams < > [email protected]> > > wrote: > > > > Sorry about that, this should work. Attached the template and the below > > error: > > > > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10] > > o.a.nifi.processors.mongodb.PutMongo > > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert > > > StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38 > ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407] > > into MongoDB due to java.lang.NegativeArraySizeException: > > java.lang.NegativeArraySizeException > > > > ________________________________ > > Date: Mon, 21 Sep 2015 15:12:43 -0400 > > Subject: Re: CSV to Mongo > > From: [email protected] > > To: [email protected] > > > > > > Adam, > > > > I imported the template and it looks like it only captured the PutMongo > > processor. Can you try deselecting everything on the graph and creating > the > > template again so we can take a look at the rest of the flow? or if you > have > > other stuff on your graph, select all of the processors you described so > > they all get captured. > > > > Also, can you provide any of the stacktrace for the exception you are > > seeing? The log is in NIFI_HOME/logs/nifi-app.log > > > > Thanks, > > > > Bryan > > > > > > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <[email protected]> wrote: > > > > Adam, > > > > Thanks for attaching the template, we will take a look and see what is > going > > on. > > > > Thanks, > > > > Bryan > > > > > > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams < > [email protected]> > > wrote: > > > > Hey Joe, > > > > Sure thing. I attached the template, I'm just taking the GDELT data set > for > > the getFile Processor which works. The error i get is a negative array. > > > > > > > >> Date: Mon, 21 Sep 2015 14:24:50 -0400 > >> Subject: Re: CSV to Mongo > >> From: [email protected] > >> To: [email protected] > > > >> > >> Adam, > >> > >> Regarding moving from Storm to NiFi i'd say they make better teammates > >> than competitors. The use case outlines above should be quite easy > >> for NiFi but there are analytic/processing functions Storm is probably > >> a better answer for. We're happy to help explore that with you as you > >> progress. > >> > >> If you ever run into an ArrayIndexBoundsException.. then it will > >> always be 100% a coding error. Would you mind sending your > >> flow.xml.gz over or making a template of the flow (assuming it > >> contains nothing sensitive)? If at all possible sample data which > >> exposes the issue would be ideal. As an alternative can you go ahead > >> and send us the resulting stack trace/error that comes out? > >> > >> We'll get this addressed. > >> > >> Thanks > >> Joe > >> > >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams > >> <[email protected]> wrote: > >> > Hello, > >> > > >> > I'm moving from storm to NiFi and trying to do a simple test with > >> > getting a > >> > large CSV file dumped into MongoDB. The CSV file has a header with > >> > column > >> > names and it is structured, my only problem is dumping it into > MongoDB. > >> > At > >> > a high level, do the following processor steps look correct? All i > want > >> > is > >> > to just pull the whole CSV file over the MongoDB without a regex or > >> > anything > >> > fancy (yet). I eventually always seem to hit trouble with array index > >> > problems with the putmongo processor: > >> > > >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> > >> > PutMongo. > >> > > >> > Does that seem to be the right way to do this in NiFi? > >> > > >> > Thank you, > >> > Adam > > > > > > > > >
