Re: Concatenate a string to a Column of type string in DataFrame

2015-12-13 Thread Yanbo Liang
Sorry, it was added since 1.5.0. 2015-12-13 2:07 GMT+08:00 Satish : > Hi, > Will the below mentioned snippet work for Spark 1.4.0 > > Thanks for your inputs > > Regards, > Satish > -- > From: Yanbo Liang > Sent:

Graphx Spark Accumulator

2015-12-13 Thread prasad223
Hi All, I'm new to Spark and Scala , I'm unable to create a Array of Integer Accumulators in Spark. val diameterAccumulator = sparkContext.accumulator(Array.fill(maxDegree)(0))(Array(maxDegree)[AccumulatorParam[Int]]) Can anyone give me a simple example of how to create an array of Int

Re: Questions on Kerberos usage with YARN and JDBC

2015-12-13 Thread Mike Wright
Kerberos seems to be working otherwise ... for example, we're using it successfully to control access to HDFS and it's linked to AD ... we're using Ranger if that helps. I'm not a systems admin guy so this is really not my area of expertise. ___ *Mike Wright* Principal Architect,

Use of rdd.zipWithUniqueId() in DStream

2015-12-13 Thread Sourav Mazumder
Hi All, I'm trying to use zipWithUniqieId() function of RDD using transform function of dStream. It does generate unique id always starting from 0 and in sequence. However, not sure whether this is a reliable behavior which is always guaranteed to generate sequence number starting form 0. Can

How to save Multilayer Perceptron Classifier model.

2015-12-13 Thread Vadim Gribanov
Hey everyone! I’m new with spark and scala. I looked at examples in user guide and didn’t find how to save Multilayer Perceptron Classifier model to HDFS. Trivial: model.save(sc, “NNModel”) Didn’t work for me. Help me please.

How to unpack the values of an item in a RDD so I can create a RDD with multiple items?

2015-12-13 Thread Abhishek Shivkumar
Hi, I have a RDD of many items. Each item has a key and its value is a list of elements. I want to unpack the elements of the item so that I can create a new RDD with each of its item being the original key and one single element. I tried doing RDD.flatmap(lambda line: [ (line[0], v) for v

Make Spark Streaming DFrame as SQL table

2015-12-13 Thread Karthikeyan Muthukumar
Hi, The aim here is as follows: - read data from Socket using Spark Streaming every N seconds - register received data as SQL table - there will be more data read from HDFS etc as reference data, they will also be registered as SQL tables - the idea is to perform arbitrary SQL queries on the

Re: How to unpack the values of an item in a RDD so I can create a RDD with multiple items?

2015-12-13 Thread Nick Pentreath
The function name is flatMap - with capital M to match the Scala API. — Sent from Mailbox On Sun, Dec 13, 2015 at 7:40 PM, Abhishek Shivkumar wrote: > Hi, > I have a RDD of many items. > Each item has a key and its value is a list of elements. > I want to

Re: Multiple drivers, same worker

2015-12-13 Thread Ted Yu
Just got back from my trip - I couldn't access gmail from laptop. I took a look at the stack trace. I saw a few jetty threads getting blocked but don't have much clue yet. Will look at the stack some more. On Wed, Dec 9, 2015 at 1:21 PM, andresb...@gmail.com wrote: > Ok,

Re: Inconsistent data in Cassandra

2015-12-13 Thread Gerard Maas
Hi Padma, Have you considered reducing the dataset before writing it to Cassandra? Looks like this consistency problem could be avoided by cleaning the dataset of unnecessary records before persisting it: val onlyMax = rddByPrimaryKey.reduceByKey{case (x,y) => Max(x,y)} // your max function

comment on table

2015-12-13 Thread Jung
Hi, My question is how to leave a comment on the tables. Sometimes, the users including me create a lot of temporary and managed tables and want to leave a short comment to know what this table means without checking records. Is there a way to do this? or suggesting an alternative will be very

why "cache table a as select * from b" will do shuffle,and create 2 stages.

2015-12-13 Thread ant2nebula
why "cache table a as select * from b" will do shuffle,and create 2 stages. example: table "ods_pay_consume" is from "KafkaUtils.createDirectStream" hiveContext.sql("cache table dwd_pay_consume as select * from ods_pay_consume") this code will make 2 statges of DAG

Re: How to save Multilayer Perceptron Classifier model.

2015-12-13 Thread Yanbo Liang
Hi Vadim, It does not support save/load for Multilayer Perceptron Model currently, you can track the issues at SPARK-11871 . Yanbo 2015-12-14 2:31 GMT+08:00 Vadim Gribanov : > Hey everyone! I’m new with spark and

Re: comment on table

2015-12-13 Thread Ted Yu
Please take a look at SPARK-5196 Cheers On Sun, Dec 13, 2015 at 8:18 PM, Jung wrote: > Hi, > My question is how to leave a comment on the tables. > Sometimes, the users including me create a lot of temporary and managed > tables and want to leave a short comment to know what

[SparkR] Any reason why saveDF's mode is append by default ?

2015-12-13 Thread Jeff Zhang
It is inconsistent with scala api which is error by default. Any reason for that ? Thanks -- Best Regards Jeff Zhang

RE: Re: Spark assembly in Maven repo?

2015-12-13 Thread Xiaoyong Zhu
Thanks! do you mean something here (for example for 1.5.1 using scala 2.10)? https://repository.apache.org/content/repositories/releases/org/apache/spark/spark-core_2.10/1.5.1/ Xiaoyong From: Sean Owen [mailto:so...@cloudera.com] Sent: Saturday, December 12, 2015 12:45 AM To: Xiaoyong Zhu

Re: IP error on starting spark-shell on windows 7

2015-12-13 Thread Akhil Das
Its a warning, not an error. What happens when you don't specify SPARK_LOCAL_IP at all? If it is able to bring up the spark shell, then try *netstat -np* and see on which address the driver is binding to. Thanks Best Regards On Thu, Dec 10, 2015 at 9:49 AM, Stefan Karos