RE: Spark 1.6.0: substring on df.select

Ewan Leith Thu, 12 May 2016 05:28:55 -0700

You could use a UDF pretty easily, something like this should work, the 
lastElement function could be changed to do pretty much any string manipulation 
you want.


import org.apache.spark.sql.functions.udf

def lastElement(input: String) = input.split("/").last

val lastElementUdf = udf(lastElement(_:String))

df.select(lastElementUdf ($"col1")).show()

Ewan


From: Bharathi Raja [mailto:raja...@yahoo.com.INVALID]
Sent: 12 May 2016 11:40
To: Raghavendra Pandey <raghavendra.pan...@gmail.com>; Bharathi Raja 
<raja...@yahoo.com.invalid>
Cc: User <user@spark.apache.org>
Subject: RE: Spark 1.6.0: substring on df.select

Thanks Raghav.

I have 5+ million records. I feel creating multiple come is not an optimal way.

Please suggest any other alternate solution.
Can’t we do any string operation in DF.Select?

Regards,
Raja

From: Raghavendra Pandey<mailto:raghavendra.pan...@gmail.com>
Sent: 11 May 2016 09:04 PM
To: Bharathi Raja<mailto:raja...@yahoo.com.invalid>
Cc: User<mailto:user@spark.apache.org>
Subject: Re: Spark 1.6.0: substring on df.select


You can create a column with count of /.  Then take max of it and create that 
many columns for every row with null fillers.

Raghav
On 11 May 2016 20:37, "Bharathi Raja" 
<raja...@yahoo.com.invalid<mailto:raja...@yahoo.com.invalid>> wrote:
Hi,

I have a dataframe column col1 with values something like 
“/client/service/version/method”. The number of “/” are not constant.
Could you please help me to extract all methods from the column col1?

In Pig i used SUBSTRING with LAST_INDEX_OF(“/”).

Thanks in advance.
Regards,
Raja

RE: Spark 1.6.0: substring on df.select

Reply via email to