That worked. Thank you. On Thu, Apr 21, 2016 at 5:26 PM, Joe Witt <joe.w...@gmail.com> wrote:
> Run the output through UpdateAttribute and put a property on that > processor with a name of 'filename' and a value of > '${filename}.yourextension' > > Thanks > Joe > > On Thu, Apr 21, 2016 at 5:24 PM, Igor Kravzov <igork.ine...@gmail.com> > wrote: > > Thanks guys. I think it will work. > > One thing: merged file comes out without extension. How do I add > extension > > to a merged file? > > > > On Thu, Apr 21, 2016 at 4:42 PM, Simon Ball <sb...@hortonworks.com> > wrote: > >> > >> For most hive JSON serdes you are going to want what some people call > JSON > >> record format. This is essentially a text file with a JSON document per > line > >> which represents a record, with reasonably consistent structure. You can > >> achieve this by ensuring your JSON is not pretty formatted (one doc per > >> line) and then just using binary concatenation in the MergeContent > processor > >> Bryne mentioned. > >> > >> Simon > >> > >> > >> On 21 Apr 2016, at 22:38, Bryan Bende <bbe...@gmail.com> wrote: > >> > >> Also, this blog has a picture of what I described with MergeContent: > >> > >> https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and > >> > >> -Bryan > >> > >> On Thu, Apr 21, 2016 at 4:37 PM, Bryan Bende <bbe...@gmail.com> wrote: > >>> > >>> Hi Igor, > >>> > >>> I don't know that much about Hive so I can't really say what format it > >>> needs to be in for Hive to understand it. > >>> > >>> If it needs to be a valid array of JSON documents, in MergeContent > change > >>> the Delimiter Strategy to "Text" which means it will use whatever > values you > >>> type directly into Header, Footer, Demarcator, and then specify [ ] , > >>> respectively as the values. > >>> > >>> That will get you something like this where {...} are the incoming > >>> documents: > >>> > >>> [ > >>> {...}, > >>> {...}, > >>> ] > >>> > >>> -Bryan > >>> > >>> > >>> On Thu, Apr 21, 2016 at 4:06 PM, Igor Kravzov <igork.ine...@gmail.com> > >>> wrote: > >>>> > >>>> Hi Brian, > >>>> > >>>> I am aware of this example. But I want to store JSON as it is and > create > >>>> external table. Like in this example. > >>>> > http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/ > >>>> What I don't know is how to properly merge multiple JSON in one file > in > >>>> order for hive to read it properly. > >>>> > >>>> On Thu, Apr 21, 2016 at 2:33 PM, Bryan Bende <bbe...@gmail.com> > wrote: > >>>>> > >>>>> Hello, > >>>>> > >>>>> I believe this example shows an approach to do it (it includes Hive > >>>>> even though the title is Solr/banana): > >>>>> > >>>>> > https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html > >>>>> > >>>>> The short version is that it extracts several attributes from each > >>>>> tweet using EvaluateJsonPath, then uses ReplaceText to replace the > FlowFile > >>>>> content with a pipe delimited string of those attributes, and then > creates a > >>>>> Hive table that knows how to handle that delimiter. With this > approach you > >>>>> don't need to set the header, footer, and demarcator in MergeContent. > >>>>> > >>>>> create table if not exists tweets_text_partition( > >>>>> tweet_id bigint, > >>>>> created_unixtime bigint, > >>>>> created_time string, > >>>>> displayname string, > >>>>> msg string, > >>>>> fulltext string > >>>>> ) > >>>>> row format delimited fields terminated by "|" > >>>>> location "/tmp/tweets_staging"; > >>>>> > >>>>> -Bryan > >>>>> > >>>>> > >>>>> On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov < > igork.ine...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> Hi guys, > >>>>>> > >>>>>> I want to create a following workflow: > >>>>>> > >>>>>> 1.Fetch tweets using GetTwitter processor. > >>>>>> 2.Merge tweets in a bigger file using MergeContent process. > >>>>>> 3.Store merged files in HDFS. > >>>>>> 4. On the hadoop/hive side I want to create an external table based > on > >>>>>> these tweets. > >>>>>> > >>>>>> There are examples how to do this tbut what I am missing is how to > >>>>>> configure MergeContent processor: what to set as header,footer and > >>>>>> demarcator. And what to use on on hive side as separator so thatit > will > >>>>>> split merged tweets in rows. Hope I described myself clearly. > >>>>>> > >>>>>> Thanks in advance. > >>>>> > >>>>> > >>>> > >>> > >> > > >