Stephen Boesch created SPARK-2686:
-------------------------------------

             Summary: Add Length support to Spark SQL and HQL and Strlen 
support to SQL
                 Key: SPARK-2686
                 URL: https://issues.apache.org/jira/browse/SPARK-2686
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.0.0, 0.9.1, 0.9.2, 1.1.0, 1.1.1
         Environment: all
            Reporter: Stephen Boesch
            Priority: Minor
             Fix For: 1.1.1


Syntactic, parsing, and operational support have been added for LEN(GTH) and 
STRLEN functions.

Examples:

SQL:

import org.apache.spark.sql._
case class TestData(key: Int, value: String)
val sqlc = new SQLContext(sc)
import sqlc._
  val testData: SchemaRDD = sqlc.sparkContext.parallelize(
    (1 to 100).map(i => TestData(i, i.toString)))
  testData.registerAsTable("testData")
sqlc.sql("select length(key) as key_len from testData order by key_len desc 
limit 5").collect
res12: Array[org.apache.spark.sql.Row] = Array([3], [2], [2], [2], [2])

HQL:

val hc = new org.apache.spark.sql.hive.HiveContext(sc)
import hc._
hc.hql
hql("select length(grp) from simplex").collect
res14: Array[org.apache.spark.sql.Row] = Array([6], [6], [6], [6])


As far as codebase changes: they have been purposefully made similar to the 
ones made for  for adding SUBSTR(ING) from July 17:

SQLParser, Optimizer, Expression, stringOperations, and HiveQL were the main 
classes changed.  The testing suites affected are ConstantFolding and  
ExpressionEvaluation.

In addition some ad-hoc testing was done as shown in the examples.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to