Yup, Joe beat me too it, but was going to suggest those options...

In the second case, you would probably use SplitText to get each line of
the CSV as a FlowFile, then ExtractText to pull out every value of the line
into attributes, then ReplaceText would construct a JSON document using
expression language to access the attributes from ExtractText.

On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <[email protected]> wrote:

> Adam, Bryan,
>
> Could do the CSV to Avro processor and then follow it with the Avro to
> JSON processor.  Alternatively, could use ExtractText to pull the
> fields as attributes and then use ReplaceText to produce a JSON
> output.
>
> Thanks
> Joe
>
> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams
> <[email protected]> wrote:
> > Bryan,
> >
> > Thanks for the feedback.  I stripped the ExtractText and tried routing
> all
> > unmatched traffic to Mongo as well, hence the CSV import problems.  Off
> the
> > top of my head i do not think MongoDB allows CSV inserts through the java
> > client, we've always had to work with the JSON/document model for it.
> For a
> > CSV format, it would have to be similar to this idea:
> >
> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
> >
> > So looking at the other processors in NiFi, is there a way then to move
> from
> > a CSV format to JSON before putting to Mongo?
> >
> > ________________________________
> > Date: Mon, 21 Sep 2015 16:09:10 -0400
> >
> > Subject: Re: CSV to Mongo
> > From: [email protected]
> > To: [email protected]
> >
> > Adam,
> >
> > I was able import the full template, thanks. A couple of things...
> >
> > The ExtractText processor works by adding user-defined properties  (the +
> > icon in the top-right of the properties window) where the property name
> is a
> > destination attribute and the value is a regular expression.
> > Right now there weren't any regular expressions defined so that processor
> > will always route the file to 'unmatched'. Generally you would probably
> want
> > to route the matched files to the next processor, and then auto-terminate
> > the unmatched relationship (assuming you want to filter out non-matches).
> >
> > Do you know if MongoDB supports inserting a CSV file through their Java
> > client? do you have similar code that already does this in Storm?
> >
> > I am honestly not that familiar with MongoDB, but in the PutMongo
> processor
> > it takes the incoming data and calls:
> > Document doc = Document.parse(new String(content, charset));
> >
> > Looking at that Document.parse() method, it looks like it expects a JSON
> > document, so I just want to make sure that we expect CSV insertions to
> work
> > here.
> > In researching this, it looks Mongo has some kind of bulkimport utility
> that
> > handles CSV [1], but this is a command line utility.
> >
> > -Bryan
> >
> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/
> >
> >
> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <
> [email protected]>
> > wrote:
> >
> > Sorry about that, this should work.  Attached the template and the below
> > error:
> >
> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
> > o.a.nifi.processors.mongodb.PutMongo
> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
> >
> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38
> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
> > into MongoDB due to java.lang.NegativeArraySizeException:
> > java.lang.NegativeArraySizeException
> >
> > ________________________________
> > Date: Mon, 21 Sep 2015 15:12:43 -0400
> > Subject: Re: CSV to Mongo
> > From: [email protected]
> > To: [email protected]
> >
> >
> > Adam,
> >
> > I imported the template and it looks like it only captured the PutMongo
> > processor. Can you try deselecting everything on the graph and creating
> the
> > template again so we can take a look at the rest of the flow? or if you
> have
> > other stuff on your graph, select all of the processors you described so
> > they all get captured.
> >
> > Also, can you provide any of the stacktrace for the exception you are
> > seeing? The log is in NIFI_HOME/logs/nifi-app.log
> >
> > Thanks,
> >
> > Bryan
> >
> >
> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <[email protected]> wrote:
> >
> > Adam,
> >
> > Thanks for attaching the template, we will take a look and see what is
> going
> > on.
> >
> > Thanks,
> >
> > Bryan
> >
> >
> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <
> [email protected]>
> > wrote:
> >
> > Hey Joe,
> >
> > Sure thing.  I attached the template, I'm just taking the GDELT data set
> for
> > the getFile Processor which works.  The error i get is a negative array.
> >
> >
> >
> >> Date: Mon, 21 Sep 2015 14:24:50 -0400
> >> Subject: Re: CSV to Mongo
> >> From: [email protected]
> >> To: [email protected]
> >
> >>
> >> Adam,
> >>
> >> Regarding moving from Storm to NiFi i'd say they make better teammates
> >> than competitors. The use case outlines above should be quite easy
> >> for NiFi but there are analytic/processing functions Storm is probably
> >> a better answer for. We're happy to help explore that with you as you
> >> progress.
> >>
> >> If you ever run into an ArrayIndexBoundsException.. then it will
> >> always be 100% a coding error. Would you mind sending your
> >> flow.xml.gz over or making a template of the flow (assuming it
> >> contains nothing sensitive)? If at all possible sample data which
> >> exposes the issue would be ideal. As an alternative can you go ahead
> >> and send us the resulting stack trace/error that comes out?
> >>
> >> We'll get this addressed.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> >> <[email protected]> wrote:
> >> > Hello,
> >> >
> >> > I'm moving from storm to NiFi and trying to do a simple test with
> >> > getting a
> >> > large CSV file dumped into MongoDB. The CSV file has a header with
> >> > column
> >> > names and it is structured, my only problem is dumping it into
> MongoDB.
> >> > At
> >> > a high level, do the following processor steps look correct? All i
> want
> >> > is
> >> > to just pull the whole CSV file over the MongoDB without a regex or
> >> > anything
> >> > fancy (yet). I eventually always seem to hit trouble with array index
> >> > problems with the putmongo processor:
> >> >
> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
> >> > PutMongo.
> >> >
> >> > Does that seem to be the right way to do this in NiFi?
> >> >
> >> > Thank you,
> >> > Adam
> >
> >
> >
> >
>

Reply via email to