Re:Re: StreamingFileWriter 压测性能

kandy.wang Wed, 16 Sep 2020 05:16:25 -0700

场景很简单，就是kafka2hive 
--5min入仓Hive

INSERT INTO  hive.temp_.hive_5min


SELECT

 arg_service,

time_local

.....

FROM_UNIXTIME((UNIX_TIMESTAMP()/300 * 300) ,'yyyyMMdd'), 
FROM_UNIXTIME((UNIX_TIMESTAMP()/300 * 300) ,'HHmm')  5min产生一个分区

FROM hive.temp_.kafka_source_pageview/*+ 
OPTIONS('properties.group.id'='kafka_hive_test', 
'scan.startup.mode'='earliest-offset') */;



--kafka source表定义

CREATE TABLE hive.temp_vipflink.kafka_source_pageview (

arg_service string COMMENT 'arg_service',

....

)WITH (

  'connector' = 'kafka',

  'topic' = '...',

  'properties.bootstrap.servers' = '...',

  'properties.group.id' = 'flink_etl_kafka_hive',

  'scan.startup.mode' = 'group-offsets',

  'format' = 'json',

  'json.fail-on-missing-field' = 'false',

  'json.ignore-parse-errors' = 'true'

);
--sink hive表定义
CREATE TABLE temp_vipflink.vipflink_dm_log_app_pageview_5min (
....
)
PARTITIONED BY (dt string , hm string) stored as orc location 
'hdfs://ssdcluster/....._5min' TBLPROPERTIES(
  'sink.partition-commit.trigger'='process-time',
  'sink.partition-commit.delay'='0 min',
  'sink.partition-commit.policy.class'='...CustomCommitPolicy',
  'sink.partition-commit.policy.kind'='metastore,success-file,custom',
  'sink.rolling-policy.check-interval' ='30s',
  'sink.rolling-policy.rollover-interval'='10min',
  'sink.rolling-policy.file-size'='128MB'
);
初步看下来，感觉瓶颈在写hdfs，hdfs 这边已经是ssd hdfs了，kafka的分区数=40 
，算子并行度=40，tps也就达到6-7万这样子，并行度放大，性能并无提升。
就是flink sql可以 
改局部某个算子的并行度，想单独改一下StreamingFileWriter算子的并行度，有什么好的办法么？然后StreamingFileWriter 
这块，有没有什么可以提升性能相关的优化参数？




在 2020-09-16 19:29:50，"Jingsong Li" <jingsongl...@gmail.com> 写道：
>Hi,
>
>可以分享下具体的测试场景吗？有对比吗？比如使用手写的DataStream作业来对比下，性能的差距？
>
>另外，压测时是否可以看下jstack？
>
>Best,
>Jingsong
>
>On Wed, Sep 16, 2020 at 2:03 PM kandy.wang <kandy1...@163.com> wrote:
>
>> 压测下来，发现streaming方式写入hive StreamingFileWriter ，在kafka partition=40 ，source
>> writer算子并行度 =40的情况下，kafka从头消费，tps只能达到 7w
>> 想了解一下，streaming方式写Hive 这块有压测过么？性能能达到多少
>
>
>
>-- 
>Best, Jingsong Lee

Re:Re: StreamingFileWriter 压测性能

回复