Andrew, If you are interested in the ExtractText+ReplaceText approach, I posted an example template that shows how to convert a line from a CSV file to a JSON document [1].
The first part of the flow is just for testing and generates a flow file with the content set to "a,b,c,d", then the ExtractText pulls those values into attributes (csv.1, csv.2, csv.3, csv.4) and ReplaceText uses them to build a JSON document. -Bryan [1] https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates (CsvToJson) On Mon, Sep 21, 2015 at 4:40 PM, Bryan Bende <[email protected]> wrote: > Yup, Joe beat me too it, but was going to suggest those options... > > In the second case, you would probably use SplitText to get each line of > the CSV as a FlowFile, then ExtractText to pull out every value of the line > into attributes, then ReplaceText would construct a JSON document using > expression language to access the attributes from ExtractText. > > On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <[email protected]> wrote: > >> Adam, Bryan, >> >> Could do the CSV to Avro processor and then follow it with the Avro to >> JSON processor. Alternatively, could use ExtractText to pull the >> fields as attributes and then use ReplaceText to produce a JSON >> output. >> >> Thanks >> Joe >> >> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams >> <[email protected]> wrote: >> > Bryan, >> > >> > Thanks for the feedback. I stripped the ExtractText and tried routing >> all >> > unmatched traffic to Mongo as well, hence the CSV import problems. Off >> the >> > top of my head i do not think MongoDB allows CSV inserts through the >> java >> > client, we've always had to work with the JSON/document model for it. >> For a >> > CSV format, it would have to be similar to this idea: >> > >> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java >> > >> > So looking at the other processors in NiFi, is there a way then to move >> from >> > a CSV format to JSON before putting to Mongo? >> > >> > ________________________________ >> > Date: Mon, 21 Sep 2015 16:09:10 -0400 >> > >> > Subject: Re: CSV to Mongo >> > From: [email protected] >> > To: [email protected] >> > >> > Adam, >> > >> > I was able import the full template, thanks. A couple of things... >> > >> > The ExtractText processor works by adding user-defined properties (the >> + >> > icon in the top-right of the properties window) where the property name >> is a >> > destination attribute and the value is a regular expression. >> > Right now there weren't any regular expressions defined so that >> processor >> > will always route the file to 'unmatched'. Generally you would probably >> want >> > to route the matched files to the next processor, and then >> auto-terminate >> > the unmatched relationship (assuming you want to filter out >> non-matches). >> > >> > Do you know if MongoDB supports inserting a CSV file through their Java >> > client? do you have similar code that already does this in Storm? >> > >> > I am honestly not that familiar with MongoDB, but in the PutMongo >> processor >> > it takes the incoming data and calls: >> > Document doc = Document.parse(new String(content, charset)); >> > >> > Looking at that Document.parse() method, it looks like it expects a JSON >> > document, so I just want to make sure that we expect CSV insertions to >> work >> > here. >> > In researching this, it looks Mongo has some kind of bulkimport utility >> that >> > handles CSV [1], but this is a command line utility. >> > >> > -Bryan >> > >> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/ >> > >> > >> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams < >> [email protected]> >> > wrote: >> > >> > Sorry about that, this should work. Attached the template and the below >> > error: >> > >> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10] >> > o.a.nifi.processors.mongodb.PutMongo >> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert >> > >> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38 >> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407] >> > into MongoDB due to java.lang.NegativeArraySizeException: >> > java.lang.NegativeArraySizeException >> > >> > ________________________________ >> > Date: Mon, 21 Sep 2015 15:12:43 -0400 >> > Subject: Re: CSV to Mongo >> > From: [email protected] >> > To: [email protected] >> > >> > >> > Adam, >> > >> > I imported the template and it looks like it only captured the PutMongo >> > processor. Can you try deselecting everything on the graph and creating >> the >> > template again so we can take a look at the rest of the flow? or if you >> have >> > other stuff on your graph, select all of the processors you described so >> > they all get captured. >> > >> > Also, can you provide any of the stacktrace for the exception you are >> > seeing? The log is in NIFI_HOME/logs/nifi-app.log >> > >> > Thanks, >> > >> > Bryan >> > >> > >> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <[email protected]> wrote: >> > >> > Adam, >> > >> > Thanks for attaching the template, we will take a look and see what is >> going >> > on. >> > >> > Thanks, >> > >> > Bryan >> > >> > >> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams < >> [email protected]> >> > wrote: >> > >> > Hey Joe, >> > >> > Sure thing. I attached the template, I'm just taking the GDELT data >> set for >> > the getFile Processor which works. The error i get is a negative array. >> > >> > >> > >> >> Date: Mon, 21 Sep 2015 14:24:50 -0400 >> >> Subject: Re: CSV to Mongo >> >> From: [email protected] >> >> To: [email protected] >> > >> >> >> >> Adam, >> >> >> >> Regarding moving from Storm to NiFi i'd say they make better teammates >> >> than competitors. The use case outlines above should be quite easy >> >> for NiFi but there are analytic/processing functions Storm is probably >> >> a better answer for. We're happy to help explore that with you as you >> >> progress. >> >> >> >> If you ever run into an ArrayIndexBoundsException.. then it will >> >> always be 100% a coding error. Would you mind sending your >> >> flow.xml.gz over or making a template of the flow (assuming it >> >> contains nothing sensitive)? If at all possible sample data which >> >> exposes the issue would be ideal. As an alternative can you go ahead >> >> and send us the resulting stack trace/error that comes out? >> >> >> >> We'll get this addressed. >> >> >> >> Thanks >> >> Joe >> >> >> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams >> >> <[email protected]> wrote: >> >> > Hello, >> >> > >> >> > I'm moving from storm to NiFi and trying to do a simple test with >> >> > getting a >> >> > large CSV file dumped into MongoDB. The CSV file has a header with >> >> > column >> >> > names and it is structured, my only problem is dumping it into >> MongoDB. >> >> > At >> >> > a high level, do the following processor steps look correct? All i >> want >> >> > is >> >> > to just pull the whole CSV file over the MongoDB without a regex or >> >> > anything >> >> > fancy (yet). I eventually always seem to hit trouble with array index >> >> > problems with the putmongo processor: >> >> > >> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> >> >> > PutMongo. >> >> > >> >> > Does that seem to be the right way to do this in NiFi? >> >> > >> >> > Thank you, >> >> > Adam >> > >> > >> > >> > >> > >
