Hi Nick,

It seems to me that the slow part of the whole pipeline is the Derby sink.
Could you change it into other sinks (for example, csv sink or even a
"discard everything" sink) and see if the throughput improves?

If this is the case, are you using the JDBC connector? If yes, you might
consider calling the `JDBCOutputFormatBuilder#setBatchInterval` method and
make the batch interval to a larger value.

Thanks

Nicholas Walton <nwal...@me.com> 于2019年11月28日周四 上午1:57写道:

> Hi,
>
> I’ve been working with a pipleline that was initially aimed at processing
> high speed sensor data, but for a proof of concept I’m feeding simulated
> data from a CSV file. Each row of the file is a sample across a number of
> time series, and I’ve been using the streaming environment to process each
> column of the file in parallel. The columns are each around 6 million
> samples in length. The outputs are being sunk into a Derby database table.
>
> However, I’m not getting a very high throughput even running across two
> 32G Dell laptops with the database on a 16Mb Macbook. According to the
> metrics it seems I’m only dropping records on to the database at a rate of
> around 20-30 per second (I assume per parallel pipe of which I’m running 16
> across the two laptops). The pipeline runs a couple of windowing operations
> one of length 10 and one of length 100 with a small amount of computation,
> but it does yield a considerable amount of output a 10G CSV file yielding a
> database of around 100Gb+.
>
> I’m thinking that the slow rate is due to using stream process to process
> a batch. So I’ve I’ve been looking at the batch support in Flink (1.8)
> intending to move the code over from stream to batch execution. However, I
> can’t make head’n tail of the batch DataSet documentation. For example, in
> the streaming environment I was setting a watermark after I read each line
> of the file to keep the time series in order, and using keyBy to split up
> the individual time series by column number. I can’t find the equivalent
> operations in the batch interface.
>
> Could someone guide me to some relevant online documentation,and before
> anyone says I have chatted with Dr Google for a good while to no
> satisfactory outcome
>
> TIA
>
> Nick Walton

Reply via email to