Birds of a Feather Kafka table

2012-10-24 Thread Felix GV
For the people at Hadoop World, we'll meet at table 9, during today's lunch. Looking forward to meeting some of you :) ! -- Félix

Re: Manage Kafka using supervisor, failed to restart

2012-10-22 Thread Felix GV
I think you need to make sure that the Zookeeper sessions have enough time to time out, so if you restart a broker too fast, it can't start properly because of that. If this is the situation you're in, then the message you would get in the Kafka broker's log should tell you pretty explicitly / clea

Re: Kafka's agent sit on localhost to forward message to broker?

2012-10-18 Thread Felix GV
Yeah, the console producer should be able to do this out of the box. The only concern would be to make sure log rotation is handled properly, but I'm sure that wouldn't be too hard... -- Felix On Wed, Oct 10, 2012 at 1:15 PM, Zsolt Dollenstein wrote: > You mean the producer? > > On Wed, Oct 1

Re: What are the most significant differences between kestrel and kafka?

2012-10-18 Thread Felix GV
Rumor has it that Twitter runs a Kafka cluster of just four nodes that can ingest all of the fire hose, all of their clicks and other smaller things ;) ... -- Felix On Wed, Oct 10, 2012 at 1:07 AM, howard chen wrote: > Hi, > > On Wed, Oct 10, 2012 at 1:33 AM, Jun Rao wrote: > > One of the ke

Re: Strata/Hadoop World NYC

2012-10-16 Thread Felix GV
s we can just choose any table and post an update to this thread to give an indication of how to find that table... -- Felix On Tue, Oct 16, 2012 at 12:54 PM, Joe Stein wrote: > I think that is a great idea, if other agree lets do it Wednesday? > > On Mon, Oct 15, 2012 at 6:03 PM, Felix

Re: Error Handling and Acknowledgements in Kafka Consumers

2012-10-15 Thread Felix GV
One thing you could do is the following: Every time you consume a message, you send it asynchronously to a pool of actor that will process the message. When the actors fail to process a message (or if it takes them longer than a certain arbitrary time-out period), they republish it to another top

Re: Strata/Hadoop World NYC

2012-10-15 Thread Felix GV
We could do a "Birds of a Feather"-style Kafka table during lunch on one of the two conference days. -- Felix On Mon, Oct 15, 2012 at 1:50 PM, Murtaza Doctor wrote: > I am also attending the conference would be great to have a Kafka meetup > or if we don't have critical mass then just a sync-u

Re: Strata/Hadoop World NYC

2012-10-14 Thread Felix GV
I'll be there for the conference only (24th/25th). It'd be interesting to meet up during or around HW :) -- Félix On 2012-10-13, at 22:18, Jonathan Creasy wrote: > I think a Kafka meetup would be great, let me know if there is > anything I can do to help. > > -Jonathan > > On Tue, Oct 9, 201

Re: Getting fixed amount of messages using Zookeeper based consumer

2012-09-14 Thread Felix GV
Hello, Sorry for doing thread necromancy on this one, but I have a little question hehe... Can you confirm whether my understanding, below, is correct please? 1. Every time I extract a message from a KafkaMessageStream, it sets my consumer offset to the offset of the beginning of the messag

Re: 7.1 support for List

2012-08-22 Thread Felix GV
calability :) -- Felix On Tue, Aug 21, 2012 at 10:41 AM, Felix GV wrote: > What I meant is that Kafka has been designed first and foremost as a > high-throughput system, and it is achieving that with a couple techniques, > but mainly by batching a bunch of events together so that it can

Re: 7.1 support for List

2012-08-21 Thread Felix GV
nomy from the calling code's > designed behavior. > > regards > > > On 08/20/2012 02:39 PM, Felix GV wrote: > >> I think the difference is merely that async publishing is a non-blocking >> call, whereas sync publishing is a blocking call, meaning that the co

Re: 7.1 support for List

2012-08-20 Thread Felix GV
ing a composite of any type containing either > Message or String. I can batch myself, but doubt this is what any of us > think is the design goal? > > > > On Mon, Aug 20, 2012 at 1:06 PM, Felix GV wrote: > > > This may not be entirely related to what you're talki

Re: 7.1 support for List

2012-08-20 Thread Felix GV
This may not be entirely related to what you're talking about, but why would an async producer not be able to meet your throughput needs, and a sync producer be able to? Both sync and async producers can be configured to batch more than one message together, and that's pretty much the main thing t

Re: Hadoop-consumer & partition question

2012-08-13 Thread Felix GV
I haven't used this script in a while, but if I remember correctly, you should have a different offset file for each broker/partition combination... In any case, the article you linked to is an outdated version of that script (as mentioned in the block at the very beginning of the post, BTW). A q

Re: Hadoop Consumer

2012-07-04 Thread Felix GV
umer > > +1 This surely sounds interesting. > > On 7/3/12 10:05 AM, "Felix GV" wrote: > > >Hmm that's surprising. I didn't know about that...! > > > >I wonder if it's a new feature... Judging from your email, I assume you're > >using

Re: Hadoop Consumer

2012-07-03 Thread Felix GV
Hmm that's surprising. I didn't know about that...! I wonder if it's a new feature... Judging from your email, I assume you're using CDH? What version? Interesting :) ... -- Felix On Tue, Jul 3, 2012 at 12:34 PM, Sybrandy, Casey < casey.sybra...@six3systems.com> wrote: > >> - Is there a vers

Re: Hadoop Consumer

2012-07-03 Thread Felix GV
Answer inlined... -- Felix On Fri, Jun 29, 2012 at 9:24 PM, Murtaza Doctor wrote: > Had a few questions around the Hadoop Consumer. > > - We have event data under the topic "foo" written to the kafka > Server/Broker in avro format and want to write those events to HDFS. Does > the Hadoop consu

Re: Kafka user group meeting

2012-06-14 Thread Felix GV
Cool :) ! Will anyone else be commuting from the Hadoop Summit to LinkedIn tomorrow evening...? It'd be nice to have bus buddies, or to carpool, or share a cab :) ! -- Felix On Wed, Jun 13, 2012 at 1:38 PM, Jun Rao wrote: > Hi, everyone, > > Here is the tentative agenda for tomorrow's Kafka

Re: Kafka user group meeting

2012-06-11 Thread Felix GV
;> Thanks, >>> >>> Jun >>> >>> On Thu, Jun 7, 2012 at 4:20 PM, Jun Rao wrote: >>> >>>> How about having a Kafka user group meeting on Jun 14, say from 7:00pm >>>> to 8:30pm at LinkedIn? That way, people flying over for Hado

Re: Kafka user group meeting

2012-06-07 Thread Felix GV
> I will be at the Hadoop Summit. > > -Jay > > On Thu, Jun 7, 2012 at 10:48 AM, Felix GV wrote: > > > I'd be interested :) > > > > I'm going to the Hadoop Summit next week (Wednesday and Thursday) and > I'll > > be in the Bay area that whol

Re: Kafka user group meeting

2012-06-07 Thread Felix GV
I'd be interested :) I'm going to the Hadoop Summit next week (Wednesday and Thursday) and I'll be in the Bay area that whole week. It's pretty short notice so I have my doubts that it would work out, but if it does that'd be cool. BTW, are any of you going to the Hadoop Summit as well :) ? -- F

Re: consumer offset reset use case

2012-05-10 Thread Felix GV
You should be able to replace AsyncValue[Boolean] with an AtomicBoolean . As for the ZK client, maybe I don't understand your question correctly, but I think this code is simply relying on the Zookeeper cl

Re: kafka and ruby client, beginner questions

2012-05-08 Thread Felix GV
The way I understand it, if you batch your messages (by default, the setting is still 1, I think, so no batching) and you have compression enabled, then each valid offset does correspond to a whole batch of messages, which is what you were referring to, I think. Keep in mind that I have not played

Re: Replication questions

2012-05-01 Thread Felix GV
e log will be > byte-for-byte identical across all servers including both the contents > and the ordering of messages. > > -Jay > > On Tue, May 1, 2012 at 9:24 AM, Felix GV wrote: > > Hmm... interesting! > > > > So, if I understanding correctly, what you&#x

Re: Replication questions

2012-05-01 Thread Felix GV
han > doing a local disk flush (< 1ms versus >= 10ms). In our own usage > desire for this kind of low-latency consumption is not common, but I > understand that this is a common need for messaging. > > -Jay > > On Thu, Apr 26, 2012 at 2:03 PM, Felix GV wrote: > >

Re: Replication questions

2012-04-26 Thread Felix GV
Thanks Jun :) -- Felix On Thu, Apr 26, 2012 at 3:26 PM, Jun Rao wrote: > Some comments inlined below. > > Thanks, > > Jun > > On Thu, Apr 26, 2012 at 10:27 AM, Felix GV wrote: > > > Cool :) Thanks for those insights :) ! > > > > I changed the sub

Replication questions

2012-04-26 Thread Felix GV
=0), > > the broker will ack the producer after the message is written to acks > > replicas. Currently, acks=0 is treated the same as acks=1. > > > > Thanks, > > > > Jun > > > > On Wed, Apr 25, 2012 at 10:39 AM, Felix GV wrote: > > > &g

Re: relation between async mode and compression

2012-04-25 Thread Felix GV
Also, compression ratios are usually better on larger payloads, so compression should, in most cases, be more effective when combined with async because the batching of messages results in larger payloads, and thus better compression ratios. -- Felix On Wed, Apr 25, 2012 at 8:14 AM, Joel Koshy

Re: Kafka mirroring and zookeeper

2012-04-25 Thread Felix GV
Just curious, but if I remember correctly from the time I read KAFKA-50 and the related JIRA issues, you guys plan to implement sync AND async replication, right? -- Felix On Tue, Apr 24, 2012 at 4:42 PM, Jay Kreps wrote: > Right now we do sloppy failover. That is when a broker goes down > tr

Re: Kafka 155

2012-04-11 Thread Felix GV
pick up the new config > > Thanks, > Neha > > > On Wed, Apr 11, 2012 at 2:02 PM, Felix GV wrote: > > Intra cluster replication is great and would alleviate (or probably > > eliminate) the need to have graceful decommission. > > > > But that still does not answ

Re: Kafka 155

2012-04-11 Thread Felix GV
Intra cluster replication is great and would alleviate (or probably eliminate) the need to have graceful decommission. But that still does not answer the question: if one had to gracefully decommission a broker today in 0.7 (or in trunk or w/ patches), how would one do it? How can we make a broke

Re: Dynamic weblog processing

2012-04-05 Thread Felix GV
As Hisham mentioned, what I've been working on is your option #2, and that can be done by using the Kafka APIs... Currently, the easiest way to get this up and running quickly would probably be your option #1, using the kafka-console-producer that was added by the kind Kafka folks in KAFKA-130 :)

Re: Kafka in AWS?

2012-03-20 Thread Felix GV
The primary use case for Kafka is to use it on AWS...??? Sorry if I put words you didn't intend in your mouth :P ... I just thought that sounded funny ;) Sorry for being off-topic. Carry on :/ ! -- Felix On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney wrote: > Yeah, that is the part I am hop

Re: Kafka+Avro+Hadoop

2012-02-14 Thread Felix GV
Indeed, there has been no mention that the LinkedIn Kafka/Hadoop ETL code has been released. I'm glad to see that the little script I made is useful for others :) ... If you want to consume Binary Avro and write it straight into Hadoop, you should be able to use the regular hadoop-consumer contri

Re: Is there any web console to monitor what's the status about broker?

2012-02-07 Thread Felix GV
The scala version of Play! is really cool! And it seems like a fitting choice considering that the core Kafka code base is in scala :) -- Felix On Tue, Feb 7, 2012 at 1:01 PM, Neha Narkhede wrote: > Evan, > > That is a good idea. Would you mind filing a JIRA so that the > community can help d

Re: Incremental Hadoop + SimpleKafkaETLJob

2012-01-25 Thread Felix GV
; On 24 Jan 2012, at 19:05, Felix GV wrote: > > > Hello :) > > > > For question 1: > > > > The hadoop consumer in the contrib directory has almost everything it > needs > > to do distributed incremental imports out of the box, but it requires a > bit > &

Re: Incremental Hadoop + SimpleKafkaETLJob

2012-01-24 Thread Felix GV
:) -- Felix On Tue, Jan 24, 2012 at 5:12 PM, Richard Park wrote: > Yeah, sorry about missing the promise to release code. > I'll talk to someone about releasing what we have. > > On Tue, Jan 24, 2012 at 11:05 AM, Felix GV wrote: > > > Hello :) > > > > For

Re: Incremental Hadoop + SimpleKafkaETLJob

2012-01-24 Thread Felix GV
Hello :) For question 1: The hadoop consumer in the contrib directory has almost everything it needs to do distributed incremental imports out of the box, but it requires a bit of hand holding. I've created two scripts to automate the process. One of them generates initial offset files, and the

Re: Kafka/ZK Cluster Example

2012-01-12 Thread Felix GV
for new requests. Adding > more mirrors doesn't alleviate this problem. > > Jun > > On Wed, Jan 11, 2012 at 3:50 PM, Felix GV wrote: > > > We've been thinking about this stuff a lot recently, at work. > > > > We've had some HD failures in our Kafka cl

Re: Kafka/ZK Cluster Example

2012-01-11 Thread Felix GV
owever, in terms of protection against data loss from HD failures, it seems like the best option for now, no? It doesn't feel right to just throw more hardware at problems hehe... but I guess sometimes it's the only choice :) ... Please tell me if that makes sense! -- Felix On Wed,

Re: Kafka/ZK Cluster Example

2012-01-11 Thread Felix GV
As I understand it, you cannot use a mirrored Kafka cluster as a hot fail-over. You could probably use it as a manual fail-over, but I don't know the complexity involved in doing that. Also, if your source cluster fails while producers were putting data into it, there will be an "unconsumed windo

Re: slow producing on ec2

2012-01-10 Thread Felix GV
On Tue, Jan 10, 2012 at 5:30 PM, Pierre-Yves Ritschard wrote: > Surely, > > I just didn't expect such dramatically low numbers > > On Tue, Jan 10, 2012 at 11:27 PM, Felix GV wrote: > > Maybe I'm overlooking something, but the first thing that came to my mind > &g

Re: slow producing on ec2

2012-01-10 Thread Felix GV
Maybe I'm overlooking something, but the first thing that came to my mind is: wouldn't you get no network latency at all on your local box if everything runs on the same machine? On EC2, the network latency would bring your overal throughput down, especially with a sync producer, wouldn't it? --

Re: hadoop-consumer never finishing

2011-11-07 Thread Felix GV
I think I've had the same bug. It's a known issue that is fixed in the trunk. You should check out Kafka from the (Apache) trunk and use the hadoop consumer provided there in the contrib directory. If I'm not mistaken, that version is more up to date than the one you mentioned on github... -- Fel

Re: Meet around Hadoop World

2011-11-07 Thread Felix GV
I'm on my way to NYC now and I'll try to join the HBase meeting tonight, however I'm leaving Wednesday night so I probably won't make it to the Hive/Kafka meet up, unfortunately... Do come talk to me if you see me during Hadoop World though :) ! I should be fairly easy to recognize, as I'll be wea

Re: How to use the hadoop consumer in distributed mode?

2011-10-26 Thread Felix GV
p but we hope to get something out there soon. > > On Wed, Oct 26, 2011 at 1:10 PM, Felix GV wrote: > > > Hi, > > > > I wanted to give a little update on this topic. > > > > I was able to make hadoop-consumer work with a kafka cluster. > > > > What

Re: How to use the hadoop consumer in distributed mode?

2011-10-26 Thread Felix GV
Hi, I wanted to give a little update on this topic. I was able to make hadoop-consumer work with a kafka cluster. What I did is: 1. I generated a .properties file for one of the kafka brokers I wanted to connect to. 2. I ran the DataGenerator program by passing the .properties file as