Re: Kafka java consumer processes duplicate messages

2016-08-01 Thread Amit K
Thanks for reply, On producer side, I have ACK as all, with 3 retries, rest all are mostly default properties. With replication factor of 2, I believe the messages from partition of downed broker will be read by other one but I doubt if that would lead to duplicate reading to such a high extent w

Partition assigner that takes current assignment into account

2016-08-01 Thread Kanak Biscuitwala
Hi, I wanted to highlight an alternative algorithm that we have written to generate JSON compatible with kafka-reassign-partitions.sh. It's novel for the following reasons: - Like the existing algorithm, it's rack-aware, balances partitions assigned to hosts, and number of leader replicas

Re: Kafka ETL for Parquet

2016-08-01 Thread Kidong Lee
Thanks for your interest Shikhar, Actually, I have questioned and discussed in the thread: https://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3CCAE1jLMOnYb2ScNweoBdsXRHOxjYLe=ha-6igldntl95abuy...@mail.gmail.com%3E The problem was for me that it was not easy to understand the connec

Consumer poll in 0.9.0.1 hanging

2016-08-01 Thread Carlos Rodriguez Fernandez
Hi, When using Apache Camel Kafka to consume messages, I notice that when the topic is not created the fetching here: org.apache.camel.component.kafka.KafkaConsumer.run.. ConsumerRecords records = consumer.poll(Long.MAX_VALUE); ... just hangs forever, even if I create the topic and publish m

Re: Kafka ETL for Parquet

2016-08-01 Thread Shikhar Bhushan
Er, mislinked HDFS connector :) https://github.com/confluentinc/kafka-connect-hdfs On Mon, Aug 1, 2016 at 3:39 PM Shikhar Bhushan wrote: > Hi Kidong, > > That's pretty cool! I'm curious what this offers over the Confluent HDFS > connector , though

Re: Kafka ETL for Parquet

2016-08-01 Thread Shikhar Bhushan
Hi Kidong, That's pretty cool! I'm curious what this offers over the Confluent HDFS connector , though. The README mentions not depending on the Schema Registry, and that the schema can be retrieved via the classpath and Consul. This functionality s

Re: Kafka 0.9.0.1 failing on new leader election

2016-08-01 Thread Gwen Shapira
This looks correct. Sorry, not sure what else it could be. On Sat, Jul 30, 2016 at 4:24 AM, Sean Morris (semorris) wrote: > Kafka 0.9.0.1 > Zookeeper 3.4.6 > Zkclient 0.7 > > I have verified I only have one zkclient.jar in my class path. > > Thanks, > Sean > > > > > On 7/29/16, 9:35 PM, "Gwen Sha

Re: Kafka java consumer processes duplicate messages

2016-08-01 Thread R Krishna
What about failed async commits in this case due to downed broker? Can it not cause consumer to read it again as offsets may not be successfully updated? On Mon, Aug 1, 2016 at 11:35 AM, Tauzell, Dave wrote: > If you kill a broker, then any uncommitted messages will be replayed. > > -Dave >

Re: newbie does python del, gc.collect() release all resources?

2016-08-01 Thread Dana Powers
Are you asking about a Kafka python driver? Or are referring to pyspark? On Aug 1, 2016 10:03, "Andy Davidson" wrote: > I am new to python. > > I find my self working with several data frames at the same time. I have > run > into some driver memory problems and want to make sure I release all > r

Kafka Consumer poll

2016-08-01 Thread sat
Hi, I am new to Kafka. We are planning to use Kafka messaging for our application. I was playing with Kafka 0.9.0.1 version and i have following queries. Sorry for asking basic questions. 1) I have instantiated Kafka Consumer and invoked consumer.poll(Long.MAX_VALUE). Although i have specified t

Re: Kafka java consumer processes duplicate messages

2016-08-01 Thread Tauzell, Dave
If you kill a broker, then any uncommitted messages will be replayed. -Dave From: R Krishna Sent: Monday, August 1, 2016 1:32 PM To: users@kafka.apache.org Subject: Re: Kafka java consumer processes duplicate messages Remember reading about these options

Re: Kafka java consumer processes duplicate messages

2016-08-01 Thread R Krishna
Remember reading about these options for higher consumer guarantees: Unclean.leader.election = false Auto.offset.commit = falseconsumer side Commit after processingsyncCommit() regularly What about your producer, does it wait until it reaches all replicas in ISR, i.e., ack=all or none? Not

Kafka java consumer processes duplicate messages

2016-08-01 Thread Amit K
Hi, I am kind of new to Kafka. I have set up a 3 node kafka (1 broker per machine) cluster with 3 node zookeer cluster. I am using Kafka 0.9.0.0 version. The set up works fine wherein from my single producer I am pushing a JSON string to Kafka to a topic with 3 partitions and replication factor o

newbie does python del, gc.collect() release all resources?

2016-08-01 Thread Andy Davidson
I am new to python. I find my self working with several data frames at the same time. I have run into some driver memory problems and want to make sure I release all resource as soon as possible. 1. should I be calling del and gc.collect() ? 2. If a dataframe was cached do I need to explicitly ca

Re: Burrow E-Mail to GMail

2016-08-01 Thread Todd Palino
So I can’t speak for general Gmail, but we have been using it through Gmail internally here for a while. Just watch out for those rate limits, because Burrow can get noisy (depending on your clusters and consumers)! -Todd On Mon, Aug 1, 2016 at 7:30 AM, Brian Dennis wrote: > Burrow users, > >

Re: Too Many Open Files

2016-08-01 Thread Thakrar, Jayesh
What are the producers/consumers for the Kafka cluster? Remember that its not just files but also sockets that add to the count. I had seen issues when we had a network switch problem and had Storm consumers. The switch would cause issues in connectivity between Kafka brokers, zookeepers and clie

Error: LEADER NOT AVAILABLE

2016-08-01 Thread Benny Ho
Hello,I'm receiving an error while publishing messages to a kafka topic, the steps I took were:1. Starting zookeeper server2. Starting kafka server3. Sending messages to kafka topic with a Kafka Producer8105 [kafka-producer-network-thread | producer-2] DEBUG org.apache.kafka.clients.NetworkClie

Kafka High Level Consumer OOME

2016-08-01 Thread 张学文
Hi Our kafka consumer application has been running for a week without any problems. But I face to OOME while trying to consume from one topic 100 partitions by 100 consumers today. The configurations for the consumers are there: zookeeper.session.timeout.ms = 1 zookeeper.sync.time.ms = 200 au

Burrow E-Mail to GMail

2016-08-01 Thread Brian Dennis
Burrow users, Before I head down the rathole of fighting with SMTP servers, does anyone have positive or negative confirmation that Burrow can reliably send e-mail notifications to a GMail account? On reading the docs, it seems quite possible, but I've been bitten in the past trying to automate le

Re: Too Many Open Files

2016-08-01 Thread Scott Thibault
Did you verify that the process has the correct limit applied? cat /proc//limits --Scott Thibault On Sun, Jul 31, 2016 at 4:14 PM, Kessiler Rodrigues wrote: > I’m still experiencing this issue… > > Here are the kafka logs. > > [2016-07-31 20:10:35,658] ERROR Error while accepting connection >

Re: Error: Leader Not Available

2016-08-01 Thread Tom Crayford
Hi there, What version of Kafka are you using? Can you share your config files and any sample code? Thanks Tom Crayford Heroku Kafka On Monday, 1 August 2016, Benny Ho wrote: > Hello, > I'm receiving an error while publishing messages to a kafka topic, the > steps I took were:1. Starting zook

Re: Too Many Open Files

2016-08-01 Thread Kessiler Rodrigues
Hey guys I got a solution for this. The kafka process wasn’t getting the limits config because I was running it under supervisor. I changed it and right now I’m using systemd to put kafka up and running! On systemd services you can setup your FD limit using a property called “LimitNOFile”. Th

Kafka ETL for Parquet

2016-08-01 Thread Kidong Lee
Hi, I have written a simple Kafka ETL which consumes avro encoded data from Kafka and save them to Parquet on HDFS: https://github.com/mykidong/kafka-etl-consumer It is implemented with Kafka Consumer API and Parquet Writer API. - Kidong Lee.

Re: Too Many Open Files

2016-08-01 Thread Anirudh P
I agree with Steve. We had a similar problem where we set the ulimit to a certain value but it was getting overridden. It only worked when we set the ulimit after logging in as root. You might want to give that a try if you have not done so already - Anirudh On Mon, Aug 1, 2016 at 1:19 PM, Steve

A specific use case

2016-08-01 Thread Hamza HACHANI
Good morning, I'm working on a specific use case. In fact i'm receiving messages from an operator network and trying to do statistics on their number per minute,perhour,per day ... I would like to create a broker that receives the messages and generates a message every minute. These producted

Re: Too Many Open Files

2016-08-01 Thread Steve Miller
Can you run lsof -p (pid) for whatever the pid is for your Kafka process? For the fd limits you've set, I don't think subtlety is required: if there's a millionish lines in the output, the fd limit you set is where you think it is, and if it's a lot lower than that, the limit isn't being applied