If you want to manage batching yourself you can use the manual flush mode.
Easiest would be the auto flush background mode.

Todd

On Oct 30, 2017 11:10 PM, "Chao Sun" <sunc...@uber.com> wrote:

> Hi Todd,
>
> Thanks for the reply! I used a single Kafka consumer to pull the data.
> For Kudu, I was doing something very simple that basically just follow the
> example here
> <https://github.com/cloudera/kudu-examples/blob/master/java/java-sample/src/main/java/org/kududb/examples/sample/Sample.java>
> .
> In specific:
>
> loop {
>   Insert insert = kuduTable.newInsert();
>   PartialRow row = insert.getRow();
>   // fill the columns
>   kuduSession.apply(insert)
> }
>
> I didn't specify the flushing mode, so it will pick up the AUTO_FLUSH_SYNC
> as default?
> should I use MANUAL_FLUSH?
>
> Thanks,
> Chao
>
> On Mon, Oct 30, 2017 at 10:39 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> Hey Chao,
>>
>> Nice to hear you are checking out Kudu.
>>
>> What are you using to consume from Kafka and write to Kudu? Is it
>> possible that it is Java code and you are using the SYNC flush mode? That
>> would result in a separate round trip for each record and thus very low
>> throughput.
>>
>> Todd
>>
>> On Oct 30, 2017 10:23 PM, "Chao Sun" <sunc...@uber.com> wrote:
>>
>> Hi,
>>
>> We are evaluating Kudu (version kudu 1.3.0-cdh5.11.1, revision
>> af02f3ea6d9a1807dcac0ec75bfbca79a01a5cab) on a 8-node cluster.
>> The data are coming from Kafka at a rate of around 30K / sec, and hash
>> partitioned into 128 buckets. However, with default settings, Kudu can only
>> consume the topics at a rate of around 1.5K / second. This is a direct
>> ingest with no transformation on the data.
>>
>> Could this because I was using the default configurations? also we are
>> using Kudu on HDD - could that also be related?
>>
>> Any help would be appreciated. Thanks.
>>
>> Best,
>> Chao
>>
>>
>>
>

Reply via email to