Thanks Kit ! We are trying HDFS bolt with external tables now. On Wed, Aug 24, 2016 at 5:25 PM, Kit Menke <[email protected]> wrote:
> Joel, > Another option which you have is to use the Storm HDFS bolt to stream data > into Hive external tables. The external tables then get loaded into ORC > history tables for long term storage. We use this in a HDP cluster with > similar load so I know it works. :) > > I'm with Jörn on this one. My impression of hive transactions is that it > is a new feature not totally ready for production. > Thanks, > Kit > > On Aug 24, 2016 3:07 AM, "Joel Victor" <[email protected]> wrote: > >> @Jörn: If I understood correctly even later versions of Hive won't be >> able to handle these kinds of workloads? >> >> On Wed, Aug 24, 2016 at 1:26 PM, Jörn Franke <[email protected]> >> wrote: >> >>> I think Hive especially these old versions have not been designed for >>> this. Why not store them in Hbase and run a oozie job regularly that puts >>> them all into Hive /Orc or parquet in a bulk job? >>> >>> On 24 Aug 2016, at 09:35, Joel Victor <[email protected]> wrote: >>> >>> Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are >>> trying perform streaming ingestion with it. >>> We are using the Storm Hive bolt and we have 7 tables in which we are >>> trying to insert. The RPS (requests per second) of our bolts ranges from >>> 7000 to 5000 and our commit policies are configured accordingly i.e 100k >>> events or 15 seconds. >>> >>> We see that there are many commitTxn exceptions due to serialization >>> errors in the metastore (we are using PostgreSQL 9.5 as metastore) >>> The serialization errors will cause the topology to start lagging in >>> terms of events processed as it will try to reprocess the batches that have >>> failed. >>> >>> I have already backported this HIVE-10500 >>> <https://issues.apache.org/jira/browse/HIVE-10500> to 0.14 and there >>> isn't much improvement. >>> I went through most of the JIRA's about transaction and I found the >>> following HIVE-11948 <https://issues.apache.org/jira/browse/HIVE-11948> >>> , HIVE-13013 <https://issues.apache.org/jira/browse/HIVE-13013>. I >>> would like to backport them to 0.14. >>> Going through the patches gives me an impression that I need to mostly >>> update the queries and transaction levels. >>> Do these patches also require me to update the schema in the metastore? >>> Please also let me know if there are any other patches that I missed. >>> >>> I would also like to know whether Apache Hive can handle inserts to the >>> same/different tables concurrently from multiple clients in 1.2.1 or later >>> versions without many serialization errors in Hive metastore? >>> >>> -Joel >>> >>> >>
