Andrew,

If you are interested in the ExtractText+ReplaceText approach, I posted an
example template that shows how to convert a line from a CSV file to a JSON
document [1].

The first part of the flow is just for testing and generates a flow file
with the content set to "a,b,c,d", then the ExtractText pulls those values
into attributes (csv.1, csv.2, csv.3, csv.4) and ReplaceText uses them to
build a JSON document.

-Bryan

[1]
https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
 (CsvToJson)


On Mon, Sep 21, 2015 at 4:40 PM, Bryan Bende <[email protected]> wrote:

> Yup, Joe beat me too it, but was going to suggest those options...
>
> In the second case, you would probably use SplitText to get each line of
> the CSV as a FlowFile, then ExtractText to pull out every value of the line
> into attributes, then ReplaceText would construct a JSON document using
> expression language to access the attributes from ExtractText.
>
> On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <[email protected]> wrote:
>
>> Adam, Bryan,
>>
>> Could do the CSV to Avro processor and then follow it with the Avro to
>> JSON processor.  Alternatively, could use ExtractText to pull the
>> fields as attributes and then use ReplaceText to produce a JSON
>> output.
>>
>> Thanks
>> Joe
>>
>> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams
>> <[email protected]> wrote:
>> > Bryan,
>> >
>> > Thanks for the feedback.  I stripped the ExtractText and tried routing
>> all
>> > unmatched traffic to Mongo as well, hence the CSV import problems.  Off
>> the
>> > top of my head i do not think MongoDB allows CSV inserts through the
>> java
>> > client, we've always had to work with the JSON/document model for it.
>> For a
>> > CSV format, it would have to be similar to this idea:
>> >
>> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
>> >
>> > So looking at the other processors in NiFi, is there a way then to move
>> from
>> > a CSV format to JSON before putting to Mongo?
>> >
>> > ________________________________
>> > Date: Mon, 21 Sep 2015 16:09:10 -0400
>> >
>> > Subject: Re: CSV to Mongo
>> > From: [email protected]
>> > To: [email protected]
>> >
>> > Adam,
>> >
>> > I was able import the full template, thanks. A couple of things...
>> >
>> > The ExtractText processor works by adding user-defined properties  (the
>> +
>> > icon in the top-right of the properties window) where the property name
>> is a
>> > destination attribute and the value is a regular expression.
>> > Right now there weren't any regular expressions defined so that
>> processor
>> > will always route the file to 'unmatched'. Generally you would probably
>> want
>> > to route the matched files to the next processor, and then
>> auto-terminate
>> > the unmatched relationship (assuming you want to filter out
>> non-matches).
>> >
>> > Do you know if MongoDB supports inserting a CSV file through their Java
>> > client? do you have similar code that already does this in Storm?
>> >
>> > I am honestly not that familiar with MongoDB, but in the PutMongo
>> processor
>> > it takes the incoming data and calls:
>> > Document doc = Document.parse(new String(content, charset));
>> >
>> > Looking at that Document.parse() method, it looks like it expects a JSON
>> > document, so I just want to make sure that we expect CSV insertions to
>> work
>> > here.
>> > In researching this, it looks Mongo has some kind of bulkimport utility
>> that
>> > handles CSV [1], but this is a command line utility.
>> >
>> > -Bryan
>> >
>> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/
>> >
>> >
>> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <
>> [email protected]>
>> > wrote:
>> >
>> > Sorry about that, this should work.  Attached the template and the below
>> > error:
>> >
>> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
>> > o.a.nifi.processors.mongodb.PutMongo
>> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
>> >
>> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38
>> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
>> > into MongoDB due to java.lang.NegativeArraySizeException:
>> > java.lang.NegativeArraySizeException
>> >
>> > ________________________________
>> > Date: Mon, 21 Sep 2015 15:12:43 -0400
>> > Subject: Re: CSV to Mongo
>> > From: [email protected]
>> > To: [email protected]
>> >
>> >
>> > Adam,
>> >
>> > I imported the template and it looks like it only captured the PutMongo
>> > processor. Can you try deselecting everything on the graph and creating
>> the
>> > template again so we can take a look at the rest of the flow? or if you
>> have
>> > other stuff on your graph, select all of the processors you described so
>> > they all get captured.
>> >
>> > Also, can you provide any of the stacktrace for the exception you are
>> > seeing? The log is in NIFI_HOME/logs/nifi-app.log
>> >
>> > Thanks,
>> >
>> > Bryan
>> >
>> >
>> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <[email protected]> wrote:
>> >
>> > Adam,
>> >
>> > Thanks for attaching the template, we will take a look and see what is
>> going
>> > on.
>> >
>> > Thanks,
>> >
>> > Bryan
>> >
>> >
>> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <
>> [email protected]>
>> > wrote:
>> >
>> > Hey Joe,
>> >
>> > Sure thing.  I attached the template, I'm just taking the GDELT data
>> set for
>> > the getFile Processor which works.  The error i get is a negative array.
>> >
>> >
>> >
>> >> Date: Mon, 21 Sep 2015 14:24:50 -0400
>> >> Subject: Re: CSV to Mongo
>> >> From: [email protected]
>> >> To: [email protected]
>> >
>> >>
>> >> Adam,
>> >>
>> >> Regarding moving from Storm to NiFi i'd say they make better teammates
>> >> than competitors. The use case outlines above should be quite easy
>> >> for NiFi but there are analytic/processing functions Storm is probably
>> >> a better answer for. We're happy to help explore that with you as you
>> >> progress.
>> >>
>> >> If you ever run into an ArrayIndexBoundsException.. then it will
>> >> always be 100% a coding error. Would you mind sending your
>> >> flow.xml.gz over or making a template of the flow (assuming it
>> >> contains nothing sensitive)? If at all possible sample data which
>> >> exposes the issue would be ideal. As an alternative can you go ahead
>> >> and send us the resulting stack trace/error that comes out?
>> >>
>> >> We'll get this addressed.
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
>> >> <[email protected]> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm moving from storm to NiFi and trying to do a simple test with
>> >> > getting a
>> >> > large CSV file dumped into MongoDB. The CSV file has a header with
>> >> > column
>> >> > names and it is structured, my only problem is dumping it into
>> MongoDB.
>> >> > At
>> >> > a high level, do the following processor steps look correct? All i
>> want
>> >> > is
>> >> > to just pull the whole CSV file over the MongoDB without a regex or
>> >> > anything
>> >> > fancy (yet). I eventually always seem to hit trouble with array index
>> >> > problems with the putmongo processor:
>> >> >
>> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
>> >> > PutMongo.
>> >> >
>> >> > Does that seem to be the right way to do this in NiFi?
>> >> >
>> >> > Thank you,
>> >> > Adam
>> >
>> >
>> >
>> >
>>
>
>

Reply via email to