Use higher transaction batch size? Begin transaction opens a file, commit transaction writes intermediate footer but the file is kept open until the entire batch completes. So bigger batch size with less frequent commits can avoid creating too many small files in hdfs. Here is a test application for hive streaming v2 https://github.com/prasanthj/culvert/blob/v2/README.md that injected ~1.5 million rows/sec with 64 threads and 100K row commit interval in hdfs. https://github.com/prasanthj/culvert/blob/v2/report.txt
Thanks Prasanth ________________________________ From: wangl...@geekplus.com.cn <wangl...@geekplus.com.cn> Sent: Friday, March 20, 2020 12:30:07 AM To: user <user@hive.apache.org> Subject: Can hive bear high throughput streaming data ingest? https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2 I want to stream my app log to Hive using flume on the edge app server. Since HDFS is not friendly to frequently write, I am afraid this way can not bear high throuthput. Any suggesions on this? Thanks, Lei ________________________________ wangl...@geekplus.com.cn<mailto:wangl...@geekplus.com.cn>