Re: Can hive bear high throughput streaming data ingest?

[email protected] Fri, 20 Mar 2020 01:57:24 -0700

Hi Prasanth,

 I tried to run your test example but me errors and submt a issue: 
  https://github.com/prasanthj/culvert/issues/1
I am using Hive3.1.1


Thanks,
Lei




[email protected]
 
发件人： Prasanth Jayachandran
发送时间： 2020-03-20 15:41
收件人： [email protected]
主题： Re: Can hive bear high throughput streaming data ingest?
Use higher transaction batch size? Begin transaction opens a file, commit 
transaction writes intermediate footer but the file is kept open until the 
entire batch completes. So bigger batch size with less frequent commits can 
avoid creating too many small files in hdfs. Here is a test application for 
hive streaming v2 https://github.com/prasanthj/culvert/blob/v2/README.md that 
injected ~1.5 million rows/sec with 64 threads and 100K row commit interval in 
hdfs. https://github.com/prasanthj/culvert/blob/v2/report.txt

Thanks
Prasanth


From: [email protected] <[email protected]>
Sent: Friday, March 20, 2020 12:30:07 AM
To: user <[email protected]>
Subject: Can hive bear high throughput streaming data ingest? 
 
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2

I want to stream my app log to Hive using flume on the edge app server.
Since HDFS is not friendly to frequently write, I am afraid this way can not 
bear  high throuthput.

Any suggesions on this?

Thanks,
Lei



[email protected]

Re: Can hive bear high throughput streaming data ingest?

Reply via email to