For example, I have a dataframe with 3 columns: URL, START, END. For each url from URL column, I want to fetch a substring of it starting from START and ending at END. +------------------------+----------+-----+ |URL |START |END | +------------------------+----------+-----+ |www.amazon.com |4 |14 | |www.yahoo.com |4 |13 | |www.amazon.com |4 |14 | |www.google.com |4 |14 |
I have UDF1: def getSubString = (input: String, start: Int, end: Int) => { input.substring(start, end) } val udf1 = udf(getSubString) and another UDF2: def getColSubString()(c1: Column, c2: Column, c3: Column): Column = { c1.substr(c2, c3-c2) } Let's assume they can both generate the result I want. But, from performance perspective, is there any difference between those two UDFs?