Hi Brian,

I am aware of this example. But I want to store JSON as it is and create
external table. Like in this example.
http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/
What I don't know is how to properly merge multiple JSON in one file in
order for hive to read it properly.

On Thu, Apr 21, 2016 at 2:33 PM, Bryan Bende <bbe...@gmail.com> wrote:

> Hello,
>
> I believe this example shows an approach to do it (it includes Hive even
> though the title is Solr/banana):
>
> https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html
>
> The short version is that it extracts several attributes from each tweet
> using EvaluateJsonPath, then uses ReplaceText to replace the FlowFile
> content with a pipe delimited string of those attributes, and then creates
> a Hive table that knows how to handle that delimiter. With this approach
> you don't need to set the header, footer, and demarcator in MergeContent.
>
> create table if not exists tweets_text_partition(
> tweet_id bigint,
> created_unixtime bigint,
> created_time string,
> displayname string,
> msg string,
> fulltext string
> )
> row format delimited fields terminated by "|"
> location "/tmp/tweets_staging";
>
> -Bryan
>
>
> On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov <igork.ine...@gmail.com>
> wrote:
>
>> Hi guys,
>>
>> I want to create a following workflow:
>>
>> 1.Fetch tweets using GetTwitter processor.
>> 2.Merge tweets in a bigger file using MergeContent process.
>> 3.Store merged files in HDFS.
>> 4. On the hadoop/hive side I want to create an external table based on
>> these tweets.
>>
>> There are examples how to do this tbut what I am missing is how to
>> configure MergeContent processor: what to set as header,footer and
>> demarcator. And what to use on on hive side as separator so thatit will
>> split merged tweets in rows. Hope I described myself clearly.
>>
>> Thanks in advance.
>>
>
>

Reply via email to