.select itself is the bulk add right?
On Tue, Jun 2, 2015 at 5:32 PM, Andrew Ash wrote:
> Would it be valuable to create a .withColumns([colName], [ColumnObject])
> method that adds in bulk rather than iteratively?
>
> Alternatively effort might be better spent in making .withColumn()
> singular
Would it be valuable to create a .withColumns([colName], [ColumnObject])
method that adds in bulk rather than iteratively?
Alternatively effort might be better spent in making .withColumn() singular
faster.
On Tue, Jun 2, 2015 at 3:46 PM, Reynold Xin wrote:
> We improved this in 1.4. Adding 100
We improved this in 1.4. Adding 100 columns took 4s on my laptop.
https://issues.apache.org/jira/browse/SPARK-7276
Still not the fastest, but much faster.
scala> Seq((1, 2)).toDF("a", "b")
res6: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala>
scala> val start = System.nanoTime
start: L
Hey,
I'm seeing extreme slowness in withColumn when it's used in a loop. I'm
running this code:
for (int i = 0; i < NUM_ITERATIONS ++i) {
df = df.withColumn("col"+i, new Column(new Literal(i,
DataTypes.IntegerType)));
}
where df is initially a trivial dataframe. Here are the results of runni