Re: spark streaming 1.3 kafka error

2015-08-22 Thread Cody Koeninger
To be perfectly clear, the direct kafka stream will also recover from any failures, because it does the simplest thing possible - fail the task and let spark retry it. If you're consistently having socket closed problems on one task after another, there's probably something else going on in your e

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Dibyendu Bhattacharya
I think you also can give a try to this consumer : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer in your environment. This has been running fine for topic with large number of Kafka partition ( > 200 ) like yours without any issue.. no issue with connection as this consumer re-use

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
On trying the consumer without external connections or with low number of external conections its working fine - so doubt is how socket got closed - 15/08/21 08:54:54 ERROR executor.Executor: Exception in task 262.0 in stage 130.0 (TID 16332) java.io.EOFException: Received -1 when reading from

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Akhil Das
Can you try some other consumer and see if the issue still exists? On Aug 22, 2015 12:47 AM, "Shushant Arora" wrote: > Exception comes when client has so many connections to some another > external server also. > So I think Exception is coming because of client side issue only- server > side ther

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
On trying the consumer without external connections or with low number of external conections its working fine - so doubt is how socket got closed - java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. On Sat, Aug 22, 2015 at 7:24 PM, Akhil Das wrote:

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
Exception comes when client has so many connections to some another external server also. So I think Exception is coming because of client side issue only- server side there is no issue. Want to understand is executor(simple consumer) not making new connection to kafka broker at start of each tas

Re: spark streaming 1.3 kafka error

2015-08-21 Thread Shushant Arora
it comes at start of each tasks when there is new data inserted in kafka.( data inserted is very few) kafka topic has 300 partitions - data inserted is ~10 MB. Tasks gets failed and it retries which succeed and after certain no of fail tasks it kills the job. On Sat, Aug 22, 2015 at 2:08 AM, A

Re: spark streaming 1.3 kafka error

2015-08-21 Thread Akhil Das
That looks like you are choking your kafka machine. Do a top on the kafka machines and see the workload, it may happen that you are spending too much time on disk io etc. On Aug 21, 2015 7:32 AM, "Cody Koeninger" wrote: > Sounds like that's happening consistently, not an occasional network > prob

Re: spark streaming 1.3 kafka error

2015-08-21 Thread Cody Koeninger
Sounds like that's happening consistently, not an occasional network problem? Look at the Kafka broker logs Make sure you've configured the correct kafka broker hosts / ports (note that direct stream does not use zookeeper host / port). Make sure that host / port is reachable from your driver an