Re: DataFrame.withColumn very slow when used iteratively?

2015-06-02 Thread Reynold Xin
.select itself is the bulk add right? On Tue, Jun 2, 2015 at 5:32 PM, Andrew Ash wrote: > Would it be valuable to create a .withColumns([colName], [ColumnObject]) > method that adds in bulk rather than iteratively? > > Alternatively effort might be better spent in making .withColumn() > singular

Re: DataFrame.withColumn very slow when used iteratively?

2015-06-02 Thread Andrew Ash
Would it be valuable to create a .withColumns([colName], [ColumnObject]) method that adds in bulk rather than iteratively? Alternatively effort might be better spent in making .withColumn() singular faster. On Tue, Jun 2, 2015 at 3:46 PM, Reynold Xin wrote: > We improved this in 1.4. Adding 100

Re: DataFrame.withColumn very slow when used iteratively?

2015-06-02 Thread Reynold Xin
We improved this in 1.4. Adding 100 columns took 4s on my laptop. https://issues.apache.org/jira/browse/SPARK-7276 Still not the fastest, but much faster. scala> Seq((1, 2)).toDF("a", "b") res6: org.apache.spark.sql.DataFrame = [a: int, b: int] scala> scala> val start = System.nanoTime start: L

DataFrame.withColumn very slow when used iteratively?

2015-06-02 Thread zsampson
Hey, I'm seeing extreme slowness in withColumn when it's used in a loop. I'm running this code: for (int i = 0; i < NUM_ITERATIONS ++i) { df = df.withColumn("col"+i, new Column(new Literal(i, DataTypes.IntegerType))); } where df is initially a trivial dataframe. Here are the results of runni