Re: Serialization or internal functions?

2020-04-04 Thread Som Lima
If you want to measure optimisation in terms of time taken , then here is an idea :) public class MyClass { public static void main(String args[]) throws InterruptedException { long start = System.currentTimeMillis(); // replace with your add column code // enough data

(float(9)/5)*x + 32) when x = 12.8

2020-04-04 Thread jane thorpe
PLATFORM zeppelin 0.9  SPARK_HOME = spark-3.0.0-preview2-bin-hadoop2.7 %spark.ipyspark # work around sc.setJobGroup("a","b") tempc = sc.parallelize([12.8]) tempf = tempc.map(lambda x: (float(9)/5)*x + 32) tempf.collect() OUTPUT [55.046] %spark.ipyspark # work around sc.setJobG

Serialization or internal functions?

2020-04-04 Thread email
Dear Community, Recently, I had to solve the following problem "for every entry of a Dataset[String], concat a constant value" , and to solve it, I used built-in functions : val data = Seq("A","b","c").toDS scala> data.withColumn("valueconcat",concat(col(data.columns.head),lit(" "),li

Re: spark-submit exit status on k8s

2020-04-04 Thread Masood Krohy
I'm not in the Spark dev team, so cannot tell you why that priority was chosen for the JIRA issue or if anyone is about to finish the work on that; I'll let others jump in if they know. Just wanted to offer a potential solution so that you can move ahead in the meantime. Masood

RE: spark-submit exit status on k8s

2020-04-04 Thread Marshall Markham
Thank you very much Masood for your fast response. Last question, is the current status in Jira representative of the status of the ticket within the project team? This seems like a big deal for the K8s implementation and we were surprised to find it marked as priority low. Is there any discussi