If you want to measure optimisation in terms of time taken , then here is
an idea :)
public class MyClass {
public static void main(String args[])
throws InterruptedException
{
long start = System.currentTimeMillis();
// replace with your add column code
// enough data
PLATFORM
zeppelin 0.9
SPARK_HOME = spark-3.0.0-preview2-bin-hadoop2.7
%spark.ipyspark
# work around
sc.setJobGroup("a","b")
tempc = sc.parallelize([12.8])
tempf = tempc.map(lambda x: (float(9)/5)*x + 32)
tempf.collect()
OUTPUT
[55.046]
%spark.ipyspark
# work around
sc.setJobG
Dear Community,
Recently, I had to solve the following problem "for every entry of a
Dataset[String], concat a constant value" , and to solve it, I used built-in
functions :
val data = Seq("A","b","c").toDS
scala> data.withColumn("valueconcat",concat(col(data.columns.head),lit("
"),li
I'm not in the Spark dev team, so cannot tell you why that priority was
chosen for the JIRA issue or if anyone is about to finish the work on
that; I'll let others jump in if they know.
Just wanted to offer a potential solution so that you can move ahead in
the meantime.
Masood
Thank you very much Masood for your fast response. Last question, is the
current status in Jira representative of the status of the ticket within the
project team? This seems like a big deal for the K8s implementation and we were
surprised to find it marked as priority low. Is there any discussi