Bryan,
Thanks for the feedback.  I stripped the ExtractText and tried routing all 
unmatched traffic to Mongo as well, hence the CSV import problems.  Off the top 
of my head i do not think MongoDB allows CSV inserts through the java client, 
we've always had to work with the JSON/document model for it.  For a CSV 
format, it would have to be similar to this idea: 
https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
So looking at the other processors in NiFi, is there a way then to move from a 
CSV format to JSON before putting to Mongo?

Date: Mon, 21 Sep 2015 16:09:10 -0400
Subject: Re: CSV to Mongo
From: [email protected]
To: [email protected]

Adam,
I was able import the full template, thanks. A couple of things...
The ExtractText processor works by adding user-defined properties  (the + icon 
in the top-right of the properties window) where the property name is a 
destination attribute and the value is a regular expression. Right now there 
weren't any regular expressions defined so that processor will always route the 
file to 'unmatched'. Generally you would probably want to route the matched 
files to the next processor, and then auto-terminate the unmatched relationship 
(assuming you want to filter out non-matches).
Do you know if MongoDB supports inserting a CSV file through their Java client? 
do you have similar code that already does this in Storm?
I am honestly not that familiar with MongoDB, but in the PutMongo processor it 
takes the incoming data and calls: Document doc = Document.parse(new 
String(content, charset));
Looking at that Document.parse() method, it looks like it expects a JSON 
document, so I just want to make sure that we expect CSV insertions to work 
here.In researching this, it looks Mongo has some kind of bulkimport utility 
that handles CSV [1], but this is a command line utility.
-Bryan
[1] http://docs.mongodb.org/manual/reference/program/mongoimport/


On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <[email protected]> 
wrote:



Sorry about that, this should work.  Attached the template and the below error:
2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10] 
o.a.nifi.processors.mongodb.PutMongo 
PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert 
StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
 into MongoDB due to java.lang.NegativeArraySizeException: 
java.lang.NegativeArraySizeException

Date: Mon, 21 Sep 2015 15:12:43 -0400
Subject: Re: CSV to Mongo
From: [email protected]
To: [email protected]

Adam, 
I imported the template and it looks like it only captured the PutMongo 
processor. Can you try deselecting everything on the graph and creating the 
template again so we can take a look at the rest of the flow? or if you have 
other stuff on your graph, select all of the processors you described so they 
all get captured.
Also, can you provide any of the stacktrace for the exception you are seeing? 
The log is in NIFI_HOME/logs/nifi-app.log
Thanks,
Bryan

On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <[email protected]> wrote:
Adam,
Thanks for attaching the template, we will take a look and see what is going on.
Thanks,
Bryan

On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <[email protected]> 
wrote:



Hey Joe,
Sure thing.  I attached the template, I'm just taking the GDELT data set for 
the getFile Processor which works.  The error i get is a negative array.


> Date: Mon, 21 Sep 2015 14:24:50 -0400
> Subject: Re: CSV to Mongo
> From: [email protected]
> To: [email protected]
> 
> Adam,
> 
> Regarding moving from Storm to NiFi i'd say they make better teammates
> than competitors.  The use case outlines above should be quite easy
> for NiFi but there are analytic/processing functions Storm is probably
> a better answer for.  We're happy to help explore that with you as you
> progress.
> 
> If you ever run into an ArrayIndexBoundsException.. then it will
> always be 100% a coding error.  Would you mind sending your
> flow.xml.gz over or making a template of the flow (assuming it
> contains nothing sensitive)?  If at all possible sample data which
> exposes the issue would be ideal.  As an alternative can you go ahead
> and send us the resulting stack trace/error that comes out?
> 
> We'll get this addressed.
> 
> Thanks
> Joe
> 
> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> <[email protected]> wrote:
> > Hello,
> >
> > I'm moving from storm to NiFi and trying to do a simple test with getting a
> > large CSV file dumped into MongoDB.  The CSV file has a header with column
> > names and it is structured, my only problem is dumping it into MongoDB.  At
> > a high level, do the following processor steps look correct?  All i want is
> > to just pull the whole CSV file over the MongoDB without a regex or anything
> > fancy (yet).  I eventually always seem to hit trouble with array index
> > problems with the putmongo processor:
> >
> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> PutMongo.
> >
> > Does that seem to be the right way to do this in NiFi?
> >
> > Thank you,
> > Adam
                                          



                                          

                                          

Reply via email to