I think Hive especially these old versions have not been designed for this. Why 
not store them in Hbase and run a oozie job regularly that puts them all into 
Hive /Orc or parquet in a bulk job?

> On 24 Aug 2016, at 09:35, Joel Victor <[email protected]> wrote:
> 
> Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are trying 
> perform streaming ingestion with it.
> We are using the Storm Hive bolt and we have 7 tables in which we are trying 
> to insert. The RPS (requests per second) of our bolts ranges from 7000 to 
> 5000 and our commit policies are configured accordingly i.e 100k events or 15 
> seconds.
> 
> We see that there are many commitTxn exceptions due to serialization errors 
> in the metastore (we are using PostgreSQL 9.5 as metastore)
> The serialization errors will cause the topology to start lagging in terms of 
> events processed as it will try to reprocess the batches that have failed.
> 
> I have already backported this HIVE-10500 to 0.14 and there isn't much 
> improvement.
> I went through most of the JIRA's about transaction and I found the following 
> HIVE-11948, HIVE-13013. I would like to backport them to 0.14.
> Going through the patches gives me an impression that I need to mostly update 
> the queries and transaction levels.
> Do these patches also require me to update the schema in the metastore? 
> Please also let me know if there are any other patches that I missed.
> 
> I would also like to know whether Apache Hive can handle inserts to the 
> same/different tables concurrently from multiple clients in 1.2.1 or later 
> versions without many serialization errors in Hive metastore?
> 
> -Joel

Reply via email to