For example, I have a dataframe with 3 columns: URL, START, END. For each
url from URL column, I want to fetch a substring of it starting from START
and ending at END.
+------------------------+----------+-----+
|URL                        |START |END |
+------------------------+----------+-----+
|www.amazon.com  |4          |14 |
|www.yahoo.com     |4          |13 |
|www.amazon.com  |4          |14 |
|www.google.com    |4          |14 |

I have UDF1:

def getSubString = (input: String, start: Int, end: Int) => {
   input.substring(start, end)
}
val udf1 = udf(getSubString)

and another UDF2:

def getColSubString()(c1: Column, c2: Column, c3: Column): Column = {
   c1.substr(c2, c3-c2)
}

Let's assume they can both generate the result I want. But, from
performance perspective, is there any difference between those two
UDFs?

Reply via email to