Agreed. I did not see that they were using the same group name. Sent from Outlook Mail for Windows 10 phone
From: PhuDuc Nguyen Sent: Friday, December 25, 2015 3:35 PM To: vivek.meghanat...@wipro.com Cc: user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Vivek, Did you say you have 8 spark jobs that are consuming from the same topic and all jobs are using the same consumer group name? If so, each job would get a subset of messages from that kafka topic, ie each job would get 1 out of 8 messages from that topic. Is that your intent? regards, Duc On Thu, Dec 24, 2015 at 7:20 AM, <vivek.meghanat...@wipro.com> wrote: We are using the older receiver based approach, the number of partitions is 1 (we have a single node kafka) and we use single thread per topic still we have the problem. Please see the API we use. All 8 spark jobs use same group name – is that a problem? val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap - Number of threads used here is 1 val searches = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(line => parse(line._2).extract[Search]) Regards, Vivek M From: Bryan [mailto:bryan.jeff...@gmail.com] Sent: 24 December 2015 17:20 To: Vivek Meghanathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; user@spark.apache.org Subject: RE: Spark Streaming + Kafka + scala job message read issue Are you using a direct stream consumer, or the older receiver based consumer? If the latter, do the number of partitions you’ve specified for your topic match the number of partitions in the topic on Kafka? That would be an possible cause – as you might receive all data from a given partition while missing data from other partitions. Regards, Bryan Jeffrey Sent from Outlook Mail for Windows 10 phone From: vivek.meghanat...@wipro.com Sent: Thursday, December 24, 2015 5:22 AM To: user@spark.apache.org Subject: Spark Streaming + Kafka + scala job message read issue Hi All, We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform. Our spark streaming job(consumer) not receiving all the messages sent to the specific topic. It receives 1 out of ~50 messages(added log in the job stream and identified). We are not seeing any errors in the kafka logs. Unable to debug further from kafka layer. The console consumer shows the INPUT topic is received in the console. it is not reaching the spark-kafka integration stream. Any thoughts how to debug this issue. Another topic is working fine in same setup. Again tried with spark 1.3.0, kafka 0.8.1.1 which is also has same issue. All these jobs are working fine in our local lab servers Regards, Vivek M The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com