Stephen Boesch created SPARK-2686: ------------------------------------- Summary: Add Length support to Spark SQL and HQL and Strlen support to SQL Key: SPARK-2686 URL: https://issues.apache.org/jira/browse/SPARK-2686 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.0, 0.9.1, 0.9.2, 1.1.0, 1.1.1 Environment: all Reporter: Stephen Boesch Priority: Minor Fix For: 1.1.1
Syntactic, parsing, and operational support have been added for LEN(GTH) and STRLEN functions. Examples: SQL: import org.apache.spark.sql._ case class TestData(key: Int, value: String) val sqlc = new SQLContext(sc) import sqlc._ val testData: SchemaRDD = sqlc.sparkContext.parallelize( (1 to 100).map(i => TestData(i, i.toString))) testData.registerAsTable("testData") sqlc.sql("select length(key) as key_len from testData order by key_len desc limit 5").collect res12: Array[org.apache.spark.sql.Row] = Array([3], [2], [2], [2], [2]) HQL: val hc = new org.apache.spark.sql.hive.HiveContext(sc) import hc._ hc.hql hql("select length(grp) from simplex").collect res14: Array[org.apache.spark.sql.Row] = Array([6], [6], [6], [6]) As far as codebase changes: they have been purposefully made similar to the ones made for for adding SUBSTR(ING) from July 17: SQLParser, Optimizer, Expression, stringOperations, and HiveQL were the main classes changed. The testing suites affected are ConstantFolding and ExpressionEvaluation. In addition some ad-hoc testing was done as shown in the examples. -- This message was sent by Atlassian JIRA (v6.2#6252)