yeah that could work, since i should know (or be able to find out) all the input columns
On Thu, May 26, 2016 at 11:30 PM, Takeshi Yamamuro <linguin....@gmail.com> wrote: > You couldn't do like this? > > -- > val func = udf((i: Int) => Tuple2(i, i)) > val df = Seq((1, ..., 0), (2, ..., 5)).toDF("input", "c0", "c1", ....other > needed columns...., "cX") > df.select(func($"a").as("r"), $"c0", $"c1", ....$"cX").select($"r._1", > $"r._2", $"c0", $"c1", ....$"cX") > > // maropu > > > On Fri, May 27, 2016 at 12:15 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> yes, but i also need all the columns (plus of course the 2 new ones) in >> my output. your select operation drops all the input columns. >> best, koert >> >> On Thu, May 26, 2016 at 11:02 PM, Takeshi Yamamuro <linguin....@gmail.com >> > wrote: >> >>> Couldn't you include all the needed columns in your input dataframe? >>> >>> // maropu >>> >>> On Fri, May 27, 2016 at 1:46 AM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> >>>> that is nice and compact, but it does not add the columns to an >>>> existing dataframe >>>> >>>> On Wed, May 25, 2016 at 11:39 PM, Takeshi Yamamuro < >>>> linguin....@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> How about this? >>>>> -- >>>>> val func = udf((i: Int) => Tuple2(i, i)) >>>>> val df = Seq((1, 0), (2, 5)).toDF("a", "b") >>>>> df.select(func($"a").as("r")).select($"r._1", $"r._2") >>>>> >>>>> // maropu >>>>> >>>>> >>>>> On Thu, May 26, 2016 at 5:11 AM, Koert Kuipers <ko...@tresata.com> >>>>> wrote: >>>>> >>>>>> hello all, >>>>>> >>>>>> i have a single udf that creates 2 outputs (so a tuple 2). i would >>>>>> like to add these 2 columns to my dataframe. >>>>>> >>>>>> my current solution is along these lines: >>>>>> df >>>>>> .withColumn("_temp_", udf(inputColumns)) >>>>>> .withColumn("x", col("_temp_)("_1")) >>>>>> .withColumn("y", col("_temp_")("_2")) >>>>>> .drop("_temp_") >>>>>> >>>>>> this works, but its not pretty with the temporary field stuff. >>>>>> >>>>>> i also tried this: >>>>>> val tmp = udf(inputColumns) >>>>>> df >>>>>> .withColumn("x", tmp("_1")) >>>>>> .withColumn("y", tmp("_2")) >>>>>> >>>>>> this also works, but unfortunately the udf is evaluated twice >>>>>> >>>>>> is there a better way to do this? >>>>>> >>>>>> thanks! koert >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> --- >>>>> Takeshi Yamamuro >>>>> >>>> >>>> >>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >> >> > > > -- > --- > Takeshi Yamamuro >