Hi Rishi,

TL;DR Using Scala, this would work
df.withColumn("derived1", lit("something")).withColumn("derived2",
col("derived1") === "something")

just note that I used 3 equal to signs instead of just two. That should be
enough, if you want to understand why read further.

so "==" gives boolean as a return value, but that is not what you want,
that's why you wrap your string "something" in lit() in the first
withColumn statement. This turns your string type into
org.apache.spark.sql.Column type which the withColumn function would accept.
alternatively
lit(col("derived1") == "something") would syntactically work and not throw
any errors, but it would always be false, since you are not checking the
values in the column derived1, you are merely testing if col("derived1"),
which is of type org.apache.spark.sql.Column is the same as "something",
which is of type string which is obviously false

below is the output of my spark shell:
scala> col("asdf") == col("asdf")
res5: Boolean = true

scala> col("derived1") == "something"
res6: Boolean = false

what you want is for your expression to return an
org.apache.spark.sql.Column type. Please take a look here and scroll down
till the "===" function. You'd see that it return an
org.apache.spark.sql.Column.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column@===(other:Any):org.apache.spark.sql.Column

It doesn't explicitly say so but using this you actually compare the values
in column "derived1" against the string "something".

Hope it helps

Regards

On Mon, Apr 22, 2019 at 8:56 AM Shraddha Shah <shah.shraddha...@gmail.com>
wrote:

> Also the same thing for groupby agg operation, how can we use one
> aggregated result (say min(amount)) to derive another aggregated column?
>
> On Sun, Apr 21, 2019 at 11:24 PM Rishi Shah <rishishah.s...@gmail.com>
> wrote:
>
>> Hello All,
>>
>> How can we use a derived column1 for deriving another column in the same
>> dataframe operation statement?
>>
>> something like:
>>
>> df = df.withColumn('derived1', lit('something'))
>> .withColumn('derived2', col('derived1') == 'something')
>>
>> --
>> Regards,
>>
>> Rishi Shah
>>
>

Reply via email to