Hi Brian, I am aware of this example. But I want to store JSON as it is and create external table. Like in this example. http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/ What I don't know is how to properly merge multiple JSON in one file in order for hive to read it properly.
On Thu, Apr 21, 2016 at 2:33 PM, Bryan Bende <bbe...@gmail.com> wrote: > Hello, > > I believe this example shows an approach to do it (it includes Hive even > though the title is Solr/banana): > > https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html > > The short version is that it extracts several attributes from each tweet > using EvaluateJsonPath, then uses ReplaceText to replace the FlowFile > content with a pipe delimited string of those attributes, and then creates > a Hive table that knows how to handle that delimiter. With this approach > you don't need to set the header, footer, and demarcator in MergeContent. > > create table if not exists tweets_text_partition( > tweet_id bigint, > created_unixtime bigint, > created_time string, > displayname string, > msg string, > fulltext string > ) > row format delimited fields terminated by "|" > location "/tmp/tweets_staging"; > > -Bryan > > > On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov <igork.ine...@gmail.com> > wrote: > >> Hi guys, >> >> I want to create a following workflow: >> >> 1.Fetch tweets using GetTwitter processor. >> 2.Merge tweets in a bigger file using MergeContent process. >> 3.Store merged files in HDFS. >> 4. On the hadoop/hive side I want to create an external table based on >> these tweets. >> >> There are examples how to do this tbut what I am missing is how to >> configure MergeContent processor: what to set as header,footer and >> demarcator. And what to use on on hive side as separator so thatit will >> split merged tweets in rows. Hope I described myself clearly. >> >> Thanks in advance. >> > >