[SPARK-21190] SPIP: Vectorized UDFs in Python

2017-06-23 Thread Reynold Xin
Welcome to the first real SPIP. SPIP: Vectorized UDFs for Python https://issues.apache.org/jira/browse/SPARK-21190 Background and Motivation: Python is one of the most popular programming languages among Spark users. Spark currently exposes a row-at-a-time interface for defining and

Re: An Update on Spark on Kubernetes [Jun 23]

2017-06-23 Thread Reynold Xin
Thanks, Anirudh. This is super helpful! On Fri, Jun 23, 2017 at 9:50 AM, Anirudh Ramanathan wrote: > *Project Description: *Kubernetes cluster manager integration that > enables native support for submitting Spark applications to a kubernetes > cluster. The submitted

An Update on Spark on Kubernetes [Jun 23]

2017-06-23 Thread Anirudh Ramanathan
*Project Description: *Kubernetes cluster manager integration that enables native support for submitting Spark applications to a kubernetes cluster. The submitted applications can make use of Kubernetes native constructs. *JIRA*: 18278 *Upstream

Re: A question about rdd transformation

2017-06-23 Thread Wenchen Fan
The exception message should include theĀ lineageĀ of the un-serializable object, can you post that too?On 23 Jun 2017, at 11:23 AM, Lionel Luffy wrote:add dev list. Who can help on below question?Thanks & Best Regards,LL-- Forwarded message --From: Lionel

Re: Handling nulls in vector columns is non-trivial

2017-06-23 Thread Franklyn D'souza
As a reference this is what is required to coalesce a vector column in pyspark. df = sc.sql.createDataFrame([(SparseVector(10,{1:44}),), (None,), (SparseVector(10,{1:23}),), (None,), (SparseVector(10,{1:35}),)], schema=schema empty_vector = sc.sql.createDataFrame([(SparseVector(10, {}),)],