Hi,
you are right, that is an interesting question.
Looks like GROUP BY is doing something funny / magic here (spark-shell
3.3.1 and 3.5.0-SNAPSHOT):
With an alias, it behaves as you have pointed out:
spark.range(3).createTempView("ids_without_dots")
spark.sql("SELECT * FROM ids_without_dots").show()
// works
spark.sql("SELECT id AS `an.id` FROM ids_without_dots GROUP BY
an.id").show()
// fails
spark.sql("SELECT id AS `an.id` FROM ids_without_dots GROUP BY
`an.id`").show()
Without an alias, it behaves as expected, which is the opposite of above
(a column with a dot exists, no alias used in SELECT):
spark.range(3).select($"id".as("an.id")).createTempView("ids_with_dots")
spark.sql("SELECT `an.id` FROM ids_with_dots").show()
// works
spark.sql("SELECT `an.id` FROM ids_with_dots GROUP BY `an.id`").show()
// fails
spark.sql("SELECT `an.id` FROM ids_with_dots GROUP BY an.id").show()
With a struct column, it also behaves as expected:
spark.range(3).select(struct($"id").as("an")).createTempView("ids_with_struct")
spark.sql("SELECT an.id FROM ids_with_struct").show()
// works
spark.sql("SELECT an.id FROM ids_with_struct GROUP BY an.id").show()
// fails
spark.sql("SELECT `an.id` FROM ids_with_struct GROUP BY an.id").show()
spark.sql("SELECT an.id FROM ids_with_struct GROUP BY `an.id`").show()
spark.sql("SELECT `an.id` FROM ids_with_struct GROUP BY `an.id`").show()
This does not feel very consistent.
Enrico
Am 28.01.23 um 00:34 schrieb Kohki Nishio:
this SQL works
select 1 as *`data.group`* from tbl group by *data.group*
Since there's no such field as *data,* I thought the SQL has to look
like this
select 1 as *`data.group`* from tbl group by `*data.group`*
But that gives and error (cannot resolve '`data.group`') ... I'm no
expert in SQL, but feel like it's a strange behavior... does anybody
have a good explanation for it ?
Thanks
--
Kohki Nishio