Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread Xiaobin She
Todd, Thank you very much for your reply. My understanding of RAID 10 is wrong. I understand that one can not get absolute sequential disk access even on one single disk, the reason I'm interested with this question is that the design document of Kafka emphasize that Kafka make advantage of the s

Thread safety of encoders

2014-10-23 Thread Rajiv Kurian
Are encoders only ever called from a single thread? I have a stateful utility class that I use to encode my objects. Is it safe to only create a single instance? Something like this: MyObjectEncoder public class MyObjectEncoder implements Encoder { private final MyObjectEncoderHelper helper

Re: Kafka sending messages with zero copy

2014-10-23 Thread Rajiv Kurian
I want to avoid allocations since I am using Java in a C mode. Even though creating objects is a mere thread local pointer bump in Java, freeing them is not so cheap and causes uncontrollable jitter. The second motivation is to avoid copying of data. Since I have objects which really look like C s

Re: a questions about 0.8.1 async publishing

2014-10-23 Thread Jun Rao
The new java producer to be released in 0.8.2 supports a callback on each message sent asynchronously. Thanks, Jun On Thu, Oct 23, 2014 at 3:26 PM, Libo Yu wrote: > Hi, > > If I use async publishing plus message acknowledgement, > is there any API to tell when all the messages in the queue >

Re: Kafka sending messages with zero copy

2014-10-23 Thread Rajiv Kurian
My use case is that though I am using Java, I want to work in as close to zero garbage environment. All my internal data structures are based on ByteBuffers or buffers allocated using Unsafe. Thus my objects are already in a state where they can be transmitted without any serialization step i.e. my

Re: Kafka sending messages with zero copy

2014-10-23 Thread Jay Kreps
It sounds like you are primarily interested in optimizing the producer? There is no way to produce data without any allocation being done and I think getting to that would be pretty hard and lead to bad apis, but avoiding memory allocation entirely shouldn't be necessary. Small transient objects i

Re: Kafka 0.9

2014-10-23 Thread Rajiv Kurian
Thanks for the reply. I guess I'll just stick to the current API then. The API for 0.9 looks really good though. Looking forward to it. On Thu, Oct 23, 2014 at 5:03 PM, Guozhang Wang wrote: > Hi Rajiv, > > We are currently working on checking in KAFKA-1583, which is a step 0 for > the new consum

Re: Kafka sending messages with zero copy

2014-10-23 Thread Guozhang Wang
Rajiv, Could you let me know your use case? Are you sending a very large file and hence would prefer streaming manner instead of messages? Guozhang On Thu, Oct 23, 2014 at 4:03 PM, Rajiv Kurian wrote: > I have a flyweight style protocol that I use for my messages. Thus they > require no serial

Re: Kafka 0.9

2014-10-23 Thread Guozhang Wang
Hi Rajiv, We are currently working on checking in KAFKA-1583, which is a step 0 for the new consumer implementation. Once that is done, we will be in full spin for the coding. That said, it is fortunately not gonna happen by end of this year. The hope is to have a unit testable consumer by the end

Re: Kafka 0.9

2014-10-23 Thread Rajiv Kurian
Is there an estimate on when it will be done. I saw (don't remember where) a date of December 2014. Maybe I am mistaken. On Thu, Oct 23, 2014 at 4:32 PM, Neha Narkhede wrote: > The new consumer development hasn't started yet. But we have a very > detailed design doc > < > https://cwiki.apache.or

Re: Performance issues

2014-10-23 Thread Mohit Anchlia
By increasing partitions and using kafka from master branch I was able to cut down the response times into half. But it still seems high and it looks like there still is a delay between a successful post and the first time message is seen by the consumers. There are plenty of resources available.

Re: Kafka 0.9

2014-10-23 Thread Neha Narkhede
The new consumer development hasn't started yet. But we have a very detailed design doc and JIRA plan, if you'd like to contribute. The new client will be protoco

Kafka 0.9

2014-10-23 Thread Rajiv Kurian
I really like the Kafka 0.9 consumer api. I want to start using it. Is it available on maven or maybe as a downloadable jar ? If not what is the best way to get it? Also wanted to ask if it the new client protocol is compatible with the 0.8 broker. Thanks!

Kafka sending messages with zero copy

2014-10-23 Thread Rajiv Kurian
I have a flyweight style protocol that I use for my messages. Thus they require no serialization/deserialization to be processed. The messages are just offset, length pairs within a ByteBuffer. Is there a producer and consumer API that forgoes allocation? I just want to give the kakfa producer off

a questions about 0.8.1 async publishing

2014-10-23 Thread Libo Yu
Hi, If I use async publishing plus message acknowledgement, is there any API to tell when all the messages in the queue have been sent out? This will be used in the shutdown hook to make sure when the process is down there is no pending messages to be published. Thanks, Libo

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread István
RAID has nothing to do with the overall availability of your system, it is just increasing the per node reliability. Regards, Istvan On Wed, Oct 22, 2014 at 11:01 AM, Gwen Shapira wrote: > RAID-10? > Interesting choice for a system where the data is already replicated > between nodes. Is it to

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread István
This is actually a very vague statement and does not cover every use case. Having a RAID10 array of 6x250G SSDs is very different from having 4x1T spinning drives. In my experience rebuilding a raid10 array that has several smaller SSD disks is hardly noticeable from the service point of view, beca

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Gwen Shapira
While I agree with Mark that testing the end-to-end pipeline is critical, note that in terms of performance - whatever you write to hook-up Teradata to Kafka is unlikely to be as fast as Teradata connector for Sqoop (especially the newer one). Quite a lot of optimization by Teradata engineers went

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Mark Roberts
If you use Kafka for the first bulk load, you will test your new Teradata->Kafka->Hive pipeline, as well as have the ability to blow away the data in Hive and reflow it from Kafka without an expensive full re-export from Teradata. As for whether Kafka can handle hundreds of GB of data: Yes, absolu

Re: Reusable consumer across consumer groups

2014-10-23 Thread Neha Narkhede
I'm wondering how much of this can be done using careful system design vs building it within the consumer itself. You could distribute the several consumer instances across machines since it is built for distributed load balancing. That will sufficiently isolate the resources required to run the va

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread Todd Palino
I've mentioned this a couple times in discussions recently as well. We were discussing the concept of infinite retention for a certain type of service, and how it might be accomplished. My suggestion was to have a combination of storage types and the ability for Kafka to look for segments in two di

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread Todd Palino
Your understanding of RAID 10 is slightly off. Because it is a combination of striping and mirroring, trying to say that there are 4000 open files per pair of disks is not accurate. The disk, as far as the system is concerned, is the entire RAID. Files are striped across all mirrors, so any open fi

Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Po Cheung
Hello, We are planning to set up a data pipeline and send periodic, incremental updates from DW to Hadoop via Kafka. For a large DW table with hundreds of GB of data, is it okay to use Kafka for the initial bulk data load? Thanks, Po

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread svante karlsson
Both variants will work well (if your kafka cluster can handle the full volume of the transmitted data for the duration of the ttl on each topic) . I would run the whole thing through kafka since you will be "stresstesting" you production flow - consider if you at some later time lost your destina

Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Po Cheung
Hello, We are planning to set up a data pipeline and send periodic, incremental updates from Teradata to Hadoop via Kafka. For a large DW table with hundreds of GB of data, is it okay (in terms of performance) to use Kafka for the initial bulk data load? Or will Sqoop with Teradata connector

Re: Reusable consumer across consumer groups

2014-10-23 Thread Stevo Slavić
Imagine exposing Kafka over various remoting protocols, where incoming poll/read requests may come in concurrently for different consumer groups, especially in a case with lots of different consumer groups. If you create and destroy KafkaConsumer for each such request, response times and throughput