Hello,

I believe this example shows an approach to do it (it includes Hive even
though the title is Solr/banana):
https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html

The short version is that it extracts several attributes from each tweet
using EvaluateJsonPath, then uses ReplaceText to replace the FlowFile
content with a pipe delimited string of those attributes, and then creates
a Hive table that knows how to handle that delimiter. With this approach
you don't need to set the header, footer, and demarcator in MergeContent.

create table if not exists tweets_text_partition(
tweet_id bigint,
created_unixtime bigint,
created_time string,
displayname string,
msg string,
fulltext string
)
row format delimited fields terminated by "|"
location "/tmp/tweets_staging";

-Bryan


On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov <igork.ine...@gmail.com>
wrote:

> Hi guys,
>
> I want to create a following workflow:
>
> 1.Fetch tweets using GetTwitter processor.
> 2.Merge tweets in a bigger file using MergeContent process.
> 3.Store merged files in HDFS.
> 4. On the hadoop/hive side I want to create an external table based on
> these tweets.
>
> There are examples how to do this tbut what I am missing is how to
> configure MergeContent processor: what to set as header,footer and
> demarcator. And what to use on on hive side as separator so thatit will
> split merged tweets in rows. Hope I described myself clearly.
>
> Thanks in advance.
>

Reply via email to