Hello, I believe this example shows an approach to do it (it includes Hive even though the title is Solr/banana): https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html
The short version is that it extracts several attributes from each tweet using EvaluateJsonPath, then uses ReplaceText to replace the FlowFile content with a pipe delimited string of those attributes, and then creates a Hive table that knows how to handle that delimiter. With this approach you don't need to set the header, footer, and demarcator in MergeContent. create table if not exists tweets_text_partition( tweet_id bigint, created_unixtime bigint, created_time string, displayname string, msg string, fulltext string ) row format delimited fields terminated by "|" location "/tmp/tweets_staging"; -Bryan On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov <igork.ine...@gmail.com> wrote: > Hi guys, > > I want to create a following workflow: > > 1.Fetch tweets using GetTwitter processor. > 2.Merge tweets in a bigger file using MergeContent process. > 3.Store merged files in HDFS. > 4. On the hadoop/hive side I want to create an external table based on > these tweets. > > There are examples how to do this tbut what I am missing is how to > configure MergeContent processor: what to set as header,footer and > demarcator. And what to use on on hive side as separator so thatit will > split merged tweets in rows. Hope I described myself clearly. > > Thanks in advance. >