There aren't any plans. But is an awesome idea and great JIRA. Thanks Joe On Sep 22, 2015 9:31 AM, "Jonathan Lyons" <[email protected]> wrote:
> Speaking of CSV to JSON conversion, is there any interest in implementing > schema inference in general, and specifically schema inference for CSV > files? This is something that was added to spark-csv recently ( > https://github.com/databricks/spark-csv/pull/93). Any thoughts? > > On Tue, Sep 22, 2015 at 9:16 AM, Bryan Bende <[email protected]> wrote: > >> Andrew, >> >> If you are interested in the ExtractText+ReplaceText approach, I posted >> an example template that shows how to convert a line from a CSV file to a >> JSON document [1]. >> >> The first part of the flow is just for testing and generates a flow file >> with the content set to "a,b,c,d", then the ExtractText pulls those values >> into attributes (csv.1, csv.2, csv.3, csv.4) and ReplaceText uses them to >> build a JSON document. >> >> -Bryan >> >> [1] >> https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates >> (CsvToJson) >> >> >> On Mon, Sep 21, 2015 at 4:40 PM, Bryan Bende <[email protected]> wrote: >> >>> Yup, Joe beat me too it, but was going to suggest those options... >>> >>> In the second case, you would probably use SplitText to get each line of >>> the CSV as a FlowFile, then ExtractText to pull out every value of the line >>> into attributes, then ReplaceText would construct a JSON document using >>> expression language to access the attributes from ExtractText. >>> >>> On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <[email protected]> wrote: >>> >>>> Adam, Bryan, >>>> >>>> Could do the CSV to Avro processor and then follow it with the Avro to >>>> JSON processor. Alternatively, could use ExtractText to pull the >>>> fields as attributes and then use ReplaceText to produce a JSON >>>> output. >>>> >>>> Thanks >>>> Joe >>>> >>>> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams >>>> <[email protected]> wrote: >>>> > Bryan, >>>> > >>>> > Thanks for the feedback. I stripped the ExtractText and tried >>>> routing all >>>> > unmatched traffic to Mongo as well, hence the CSV import problems. >>>> Off the >>>> > top of my head i do not think MongoDB allows CSV inserts through the >>>> java >>>> > client, we've always had to work with the JSON/document model for >>>> it. For a >>>> > CSV format, it would have to be similar to this idea: >>>> > >>>> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java >>>> > >>>> > So looking at the other processors in NiFi, is there a way then to >>>> move from >>>> > a CSV format to JSON before putting to Mongo? >>>> > >>>> > ________________________________ >>>> > Date: Mon, 21 Sep 2015 16:09:10 -0400 >>>> > >>>> > Subject: Re: CSV to Mongo >>>> > From: [email protected] >>>> > To: [email protected] >>>> > >>>> > Adam, >>>> > >>>> > I was able import the full template, thanks. A couple of things... >>>> > >>>> > The ExtractText processor works by adding user-defined properties >>>> (the + >>>> > icon in the top-right of the properties window) where the property >>>> name is a >>>> > destination attribute and the value is a regular expression. >>>> > Right now there weren't any regular expressions defined so that >>>> processor >>>> > will always route the file to 'unmatched'. Generally you would >>>> probably want >>>> > to route the matched files to the next processor, and then >>>> auto-terminate >>>> > the unmatched relationship (assuming you want to filter out >>>> non-matches). >>>> > >>>> > Do you know if MongoDB supports inserting a CSV file through their >>>> Java >>>> > client? do you have similar code that already does this in Storm? >>>> > >>>> > I am honestly not that familiar with MongoDB, but in the PutMongo >>>> processor >>>> > it takes the incoming data and calls: >>>> > Document doc = Document.parse(new String(content, charset)); >>>> > >>>> > Looking at that Document.parse() method, it looks like it expects a >>>> JSON >>>> > document, so I just want to make sure that we expect CSV insertions >>>> to work >>>> > here. >>>> > In researching this, it looks Mongo has some kind of bulkimport >>>> utility that >>>> > handles CSV [1], but this is a command line utility. >>>> > >>>> > -Bryan >>>> > >>>> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/ >>>> > >>>> > >>>> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams < >>>> [email protected]> >>>> > wrote: >>>> > >>>> > Sorry about that, this should work. Attached the template and the >>>> below >>>> > error: >>>> > >>>> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10] >>>> > o.a.nifi.processors.mongodb.PutMongo >>>> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert >>>> > >>>> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38 >>>> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407] >>>> > into MongoDB due to java.lang.NegativeArraySizeException: >>>> > java.lang.NegativeArraySizeException >>>> > >>>> > ________________________________ >>>> > Date: Mon, 21 Sep 2015 15:12:43 -0400 >>>> > Subject: Re: CSV to Mongo >>>> > From: [email protected] >>>> > To: [email protected] >>>> > >>>> > >>>> > Adam, >>>> > >>>> > I imported the template and it looks like it only captured the >>>> PutMongo >>>> > processor. Can you try deselecting everything on the graph and >>>> creating the >>>> > template again so we can take a look at the rest of the flow? or if >>>> you have >>>> > other stuff on your graph, select all of the processors you described >>>> so >>>> > they all get captured. >>>> > >>>> > Also, can you provide any of the stacktrace for the exception you are >>>> > seeing? The log is in NIFI_HOME/logs/nifi-app.log >>>> > >>>> > Thanks, >>>> > >>>> > Bryan >>>> > >>>> > >>>> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <[email protected]> >>>> wrote: >>>> > >>>> > Adam, >>>> > >>>> > Thanks for attaching the template, we will take a look and see what >>>> is going >>>> > on. >>>> > >>>> > Thanks, >>>> > >>>> > Bryan >>>> > >>>> > >>>> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams < >>>> [email protected]> >>>> > wrote: >>>> > >>>> > Hey Joe, >>>> > >>>> > Sure thing. I attached the template, I'm just taking the GDELT data >>>> set for >>>> > the getFile Processor which works. The error i get is a negative >>>> array. >>>> > >>>> > >>>> > >>>> >> Date: Mon, 21 Sep 2015 14:24:50 -0400 >>>> >> Subject: Re: CSV to Mongo >>>> >> From: [email protected] >>>> >> To: [email protected] >>>> > >>>> >> >>>> >> Adam, >>>> >> >>>> >> Regarding moving from Storm to NiFi i'd say they make better >>>> teammates >>>> >> than competitors. The use case outlines above should be quite easy >>>> >> for NiFi but there are analytic/processing functions Storm is >>>> probably >>>> >> a better answer for. We're happy to help explore that with you as you >>>> >> progress. >>>> >> >>>> >> If you ever run into an ArrayIndexBoundsException.. then it will >>>> >> always be 100% a coding error. Would you mind sending your >>>> >> flow.xml.gz over or making a template of the flow (assuming it >>>> >> contains nothing sensitive)? If at all possible sample data which >>>> >> exposes the issue would be ideal. As an alternative can you go ahead >>>> >> and send us the resulting stack trace/error that comes out? >>>> >> >>>> >> We'll get this addressed. >>>> >> >>>> >> Thanks >>>> >> Joe >>>> >> >>>> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams >>>> >> <[email protected]> wrote: >>>> >> > Hello, >>>> >> > >>>> >> > I'm moving from storm to NiFi and trying to do a simple test with >>>> >> > getting a >>>> >> > large CSV file dumped into MongoDB. The CSV file has a header with >>>> >> > column >>>> >> > names and it is structured, my only problem is dumping it into >>>> MongoDB. >>>> >> > At >>>> >> > a high level, do the following processor steps look correct? All i >>>> want >>>> >> > is >>>> >> > to just pull the whole CSV file over the MongoDB without a regex or >>>> >> > anything >>>> >> > fancy (yet). I eventually always seem to hit trouble with array >>>> index >>>> >> > problems with the putmongo processor: >>>> >> > >>>> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> >>>> >> > PutMongo. >>>> >> > >>>> >> > Does that seem to be the right way to do this in NiFi? >>>> >> > >>>> >> > Thank you, >>>> >> > Adam >>>> > >>>> > >>>> > >>>> > >>>> >>> >>> >> >
