Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
Right, well I don’t think the issue is with how you’re compiling the scala. I think it’s a conflict between different versions of several libs. I had similar issues with my spark modules. You need to make sure you’re not loading a different version of the same lib that is clobbering another depe

Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Siva
Hi Everyone, All of sudden we are encountering the below error from one of the spark consumer. It used to work before without any issues. When I restart the consumer with latest offsets, it is working fine for sometime (it executed few batches) and it fails again, this issue is intermittent. Did

Re: Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Ted Yu
KafkaLZ4BlockOutputStream is in kafka-clients jar : $ jar tvf kafka-clients-0.8.2.0.jar | grep KafkaLZ4BlockOutputStream 1609 Wed Jan 28 22:30:36 PST 2015 org/apache/kafka/common/message/KafkaLZ4BlockOutputStream$BD.class 2918 Wed Jan 28 22:30:36 PST 2015 org/apache/kafka/common/message/KafkaL

Re: Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Pankaj Wahane
Next thing you may want to check is if the jar has been provided to all the executors in your cluster. Most of the class not found errors got resolved for me after making required jars available in the SparkContext. Thanks. From: Ted Yu mailto:yuzhih...@gmail.com>> Date: Saturday, 12 March 2016

Spark Serializer VS Hadoop Serializer

2016-03-11 Thread Fei Hu
Hi, I am trying to migrate the program from Hadoop to Spark, but I met a problem about the serialization. In the Hadoop program, the key and value classes implement org.apache.hadoop.io.WritableComparable, which are for the serialization. Now in the spark program, I used newAPIHadoopRDD to read

Spark session dies in about 2 days: HDFS_DELEGATION_TOKEN token can't be found

2016-03-11 Thread Ruslan Dautkhanov
Spark session dies out after ~40 hours when running against Hadoop Secure cluster. spark-submit has --principal and --keytab so kerberos ticket renewal works fine according to logs. Some happens with HDFS dfs connection? These messages come up every 1 second: See complete stack: http://pastebi

Spark with Yarn Client

2016-03-11 Thread Divya Gehlot
Hi, I am trying to understand behaviour /configuration of spark with yarn client on hadoop cluster . Can somebody help me or point me document /blog/books which has deeper understanding of above two. Thanks, Divya

Re: Spark with Yarn Client

2016-03-11 Thread Alexander Pivovarov
Check doc - http://spark.apache.org/docs/latest/running-on-yarn.html also you can start EMR-4.2.0 or 4.3.0 cluster with Spark app and see how it's configured On Fri, Mar 11, 2016 at 7:50 PM, Divya Gehlot wrote: > Hi, > I am trying to understand behaviour /configuration of spark with yarn > clie

Re: Repeating Records w/ Spark + Avro?

2016-03-11 Thread Peyman Mohajerian
Here is the reason for the behavior: '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each record, directly caching the returned RDD or directly passing it to an aggregation or shuffle operation will create many references to the same object. If you plan to direc

spark-submit with cluster deploy mode fails with ClassNotFoundException (jars are not passed around properley?)

2016-03-11 Thread Hiroyuki Yamada
Hi, I am trying to work with spark-submit with cluster deploy mode in single node, but I keep getting ClassNotFoundException as shown below. (in this case, snakeyaml.jar is not found from the spark cluster) === 16/03/12 14:19:12 INFO Remoting: Starting remoting 16/03/12 14:19:12 INFO Remoting: R

Re: Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Siva
Thanks a lot Ted and Pankaj for your response. Changing the class path with correct version of kafka jars resolved the issue. Thanks, Sivakumar Bhavanari. On Fri, Mar 11, 2016 at 5:59 PM, Pankaj Wahane wrote: > Next thing you may want to check is if the jar has been provided to all > the execut

NullPointerException

2016-03-11 Thread Saurabh Guru
I am seeing the following exception in my Spark Cluster every few days in production. 2016-03-12 05:30:00,541 - WARN TaskSetManager - Lost task 0.0 in stage 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us -west-1.compute.internal ): java.lang.NullPointerException at

Correct way to use spark streaming with apache zeppelin

2016-03-11 Thread trung kien
Hi all, I've just viewed some Zeppenlin's videos. The intergration between Zeppenlin and Spark is really amazing and i want to use it for my application. In my app, i will have a Spark streaming app to do some basic realtime aggregation ( intermediate data). Then i want to use Zeppenlin to do som

Re: NullPointerException

2016-03-11 Thread Prabhu Joseph
Looking at ExternalSorter.scala line 192 189 while (records.hasNext) { addElementsRead() kv = records.next() map.changeValue((getPartition(kv._1), kv._1), update) maybeSpillCollection(usingMap = true) } On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru wrote: > I am seeing the following exception

Re: Correct way to use spark streaming with apache zeppelin

2016-03-11 Thread Mich Talebzadeh
Hi, I use Zeppelin as well and in the notebook mode you can do analytics much like what you do in Spark-shell. You can store your intermediate data in Parquet if you wish and then analyse data the way you like. What is your use case here? Zeppelin as I use it is a web UI to your spark-shell, acc

Re: NullPointerException

2016-03-11 Thread Prabhu Joseph
Looking at ExternalSorter.scala line 192, i suspect some input record has Null key. 189 while (records.hasNext) { 190addElementsRead() 191kv = records.next() 192map.changeValue((getPartition(kv._1), kv._1), update) On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph wrote: > Looking

Re: NullPointerException

2016-03-11 Thread Ted Yu
Which Spark release do you use ? I wonder if the following may have fixed the problem: SPARK-8029 Robust shuffle writer JIRA is down, cannot check now. On Fri, Mar 11, 2016 at 11:01 PM, Saurabh Guru wrote: > I am seeing the following exception in my Spark Cluster every few days in > production

Re: NullPointerException

2016-03-11 Thread Saurabh Guru
I am using the following versions: org.apache.spark spark-streaming_2.10 1.6.0 org.apache.spark spark-streaming-kafka_2.10

<    1   2