Hi, For this particular case I'd use Column.substr ( http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column), e.g.
val ns = Seq(("hello world", 1, 5)).toDF("w", "b", "e") scala> ns.select($"w".substr($"b", $"e" - $"b" + 1) as "demo").show +-----+ | demo| +-----+ |hello| +-----+ Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski On Tue, May 14, 2019 at 5:08 PM Qian He <hq.ja...@gmail.com> wrote: > For example, I have a dataframe with 3 columns: URL, START, END. For each > url from URL column, I want to fetch a substring of it starting from START > and ending at END. > +------------------------+----------+-----+ > |URL |START |END | > +------------------------+----------+-----+ > |www.amazon.com |4 |14 | > |www.yahoo.com |4 |13 | > |www.amazon.com |4 |14 | > |www.google.com |4 |14 | > > I have UDF1: > > def getSubString = (input: String, start: Int, end: Int) => { > input.substring(start, end) > } > val udf1 = udf(getSubString) > > and another UDF2: > > def getColSubString()(c1: Column, c2: Column, c3: Column): Column = { > c1.substr(c2, c3-c2) > } > > Let's assume they can both generate the result I want. But, from performance > perspective, is there any difference between those two UDFs? > > >