[ 
https://issues.apache.org/jira/browse/NIFI-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Turcsanyi updated NIFI-8532:
----------------------------------
    Description: 
{{ConsumeKinesisStream}} was added in NIFI-2892 with some seemingly sensible 
default property settings that are passed to the Kinesis Client Library.

Some performance testing/experimentation should be conducted for different NiFi 
setups (e.g. single instance, remote instance, cluster) to determine whether 
other settings should be used by default.

For example (from comments in PR 
[#4822|https://github.com/apache/nifi/pull/4822]):
{quote}
[~turcsanyip] did some kind of performance tests with 1 Mio messages in 12 
shards and running NiFi on [his] local machine (no cluster). It took 64-74 
seconds with default settings (no additional dynamic properties).

The interesting thing that [he] could not see any difference when I set 
Checkpoint Interval to 0 sec (that is checkpointing "synchronously" after each 
bunch of messages received in IRecordProcessor.processRecords() callback). It 
seems there is no significant overhead of checkpointing more frequently (and it 
has the advantage of having fewer duplicated messages in case of restart).

It can be investigated further in a follow-up ticket with more sophisticated 
performance tests (nifi cluster, non-local machine, tuning KCL properties, etc) 
and the default can be adjusted if that is reasonable.
{quote}
(this is *that* ticket)

  was:
{{ConsumeKinesisStream}} was added in NIFI-2982 with some seemingly sensible 
default property settings that are passed to the Kinesis Client Library.

Some performance testing/experimentation should be conducted for different NiFi 
setups (e.g. single instance, remote instance, cluster) to determine whether 
other settings should be used by default.

For example (from comments in PR 
[#4822|https://github.com/apache/nifi/pull/4822]):
{quote}
[~turcsanyip] did some kind of performance tests with 1 Mio messages in 12 
shards and running NiFi on [his] local machine (no cluster). It took 64-74 
seconds with default settings (no additional dynamic properties).

The interesting thing that [he] could not see any difference when I set 
Checkpoint Interval to 0 sec (that is checkpointing "synchronously" after each 
bunch of messages received in IRecordProcessor.processRecords() callback). It 
seems there is no significant overhead of checkpointing more frequently (and it 
has the advantage of having fewer duplicated messages in case of restart).

It can be investigated further in a follow-up ticket with more sophisticated 
performance tests (nifi cluster, non-local machine, tuning KCL properties, etc) 
and the default can be adjusted if that is reasonable.
{quote}
(this is *that* ticket)


> ConsumeKinesisStream tuning/performance testing
> -----------------------------------------------
>
>                 Key: NIFI-8532
>                 URL: https://issues.apache.org/jira/browse/NIFI-8532
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Chris Sampson
>            Priority: Major
>
> {{ConsumeKinesisStream}} was added in NIFI-2892 with some seemingly sensible 
> default property settings that are passed to the Kinesis Client Library.
> Some performance testing/experimentation should be conducted for different 
> NiFi setups (e.g. single instance, remote instance, cluster) to determine 
> whether other settings should be used by default.
> For example (from comments in PR 
> [#4822|https://github.com/apache/nifi/pull/4822]):
> {quote}
> [~turcsanyip] did some kind of performance tests with 1 Mio messages in 12 
> shards and running NiFi on [his] local machine (no cluster). It took 64-74 
> seconds with default settings (no additional dynamic properties).
> The interesting thing that [he] could not see any difference when I set 
> Checkpoint Interval to 0 sec (that is checkpointing "synchronously" after 
> each bunch of messages received in IRecordProcessor.processRecords() 
> callback). It seems there is no significant overhead of checkpointing more 
> frequently (and it has the advantage of having fewer duplicated messages in 
> case of restart).
> It can be investigated further in a follow-up ticket with more sophisticated 
> performance tests (nifi cluster, non-local machine, tuning KCL properties, 
> etc) and the default can be adjusted if that is reasonable.
> {quote}
> (this is *that* ticket)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to