This is also a good option. With respect to Hive transactional tables: I do to think they have been designed for massive inserts of single items. On the other hand you would not insert a lot of events using single inserts in a relational database. Same restrictions apply, it is not the use case you want to implement.
> On 24 Aug 2016, at 13:55, Kit Menke <[email protected]> wrote: > > Joel, > Another option which you have is to use the Storm HDFS bolt to stream data > into Hive external tables. The external tables then get loaded into ORC > history tables for long term storage. We use this in a HDP cluster with > similar load so I know it works. :) > > I'm with Jörn on this one. My impression of hive transactions is that it is a > new feature not totally ready for production. > Thanks, > Kit > > >> On Aug 24, 2016 3:07 AM, "Joel Victor" <[email protected]> wrote: >> @Jörn: If I understood correctly even later versions of Hive won't be able >> to handle these kinds of workloads? >> >>> On Wed, Aug 24, 2016 at 1:26 PM, Jörn Franke <[email protected]> wrote: >>> I think Hive especially these old versions have not been designed for this. >>> Why not store them in Hbase and run a oozie job regularly that puts them >>> all into Hive /Orc or parquet in a bulk job? >>> >>>> On 24 Aug 2016, at 09:35, Joel Victor <[email protected]> wrote: >>>> >>>> Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are >>>> trying perform streaming ingestion with it. >>>> We are using the Storm Hive bolt and we have 7 tables in which we are >>>> trying to insert. The RPS (requests per second) of our bolts ranges from >>>> 7000 to 5000 and our commit policies are configured accordingly i.e 100k >>>> events or 15 seconds. >>>> >>>> We see that there are many commitTxn exceptions due to serialization >>>> errors in the metastore (we are using PostgreSQL 9.5 as metastore) >>>> The serialization errors will cause the topology to start lagging in terms >>>> of events processed as it will try to reprocess the batches that have >>>> failed. >>>> >>>> I have already backported this HIVE-10500 to 0.14 and there isn't much >>>> improvement. >>>> I went through most of the JIRA's about transaction and I found the >>>> following HIVE-11948, HIVE-13013. I would like to backport them to 0.14. >>>> Going through the patches gives me an impression that I need to mostly >>>> update the queries and transaction levels. >>>> Do these patches also require me to update the schema in the metastore? >>>> Please also let me know if there are any other patches that I missed. >>>> >>>> I would also like to know whether Apache Hive can handle inserts to the >>>> same/different tables concurrently from multiple clients in 1.2.1 or later >>>> versions without many serialization errors in Hive metastore? >>>> >>>> -Joel
