Nicholas Chammas created SPARK-21110: ----------------------------------------
Summary: Structs should be orderable Key: SPARK-21110 URL: https://issues.apache.org/jira/browse/SPARK-21110 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.1 Reporter: Nicholas Chammas Priority: Minor It seems like a missing feature that you can't compare structs in a filter on a DataFrame. Here's a simple demonstration of a) where this would be useful and b) how it's different from simply comparing each of the components of the structs. {code} import pyspark from pyspark.sql.functions import col, struct, concat spark = pyspark.sql.SparkSession.builder.getOrCreate() df = spark.createDataFrame( [ ('Boston', 'Bob'), ('Boston', 'Nick'), ('San Francisco', 'Bob'), ('San Francisco', 'Nick'), ], ['city', 'person'] ) pairs = ( df.select( struct('city', 'person').alias('p1') ) .crossJoin( df.select( struct('city', 'person').alias('p2') ) ) ) print("Everything") pairs.show() print("Comparing parts separately (doesn't give me what I want)") (pairs .where(col('p1.city') < col('p2.city')) .where(col('p1.person') < col('p2.person')) .show()) print("Comparing parts together with concat (gives me what I want but is hacky)") (pairs .where(concat('p1.city', 'p1.person') < concat('p2.city', 'p2.person')) .show()) print("Comparing parts together with struct (my desired solution but currently yields an error)") (pairs .where(col('p1') < col('p2')) .show()) {code} The last query yields the following error in Spark 2.1.1: {code} org.apache.spark.sql.AnalysisException: cannot resolve '(`p1` < `p2`)' due to data type mismatch: '(`p1` < `p2`)' requires (boolean or tinyint or smallint or int or bigint or float or double or decimal or timestamp or date or string or binary) type, not struct<city:string,person:string>;; 'Filter (p1#5 < p2#8) +- Join Cross :- Project [named_struct(city, city#0, person, person#1) AS p1#5] : +- LogicalRDD [city#0, person#1] +- Project [named_struct(city, city#0, person, person#1) AS p2#8] +- LogicalRDD [city#0, person#1] {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org