[ 
https://issues.apache.org/jira/browse/SPARK-32630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178624#comment-17178624
 ] 

Simeon Simeonov commented on SPARK-32630:
-----------------------------------------

[~rxin] fyi, one of the subtle issues that add friction to new users being 
safely productive on the platform.

> Reduce user confusion and subtle bugs by optionally preventing date & 
> timestamp comparison
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-32630
>                 URL: https://issues.apache.org/jira/browse/SPARK-32630
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Simeon Simeonov
>            Priority: Major
>              Labels: comparison, sql, timestamps
>
> https://issues.apache.org/jira/browse/SPARK-23549 made Spark's handling of 
> date vs. timestamp comparison consistent with SQL, which, unfortunately, 
> isn't consistent with common sense.
> When dates are compared with timestamps, they are promoted to timestamps at 
> midnight of the date, in the server timezone, which is almost always UTC. 
> This only works well if all timestamps in the data are logically time 
> instants as opposed to dates + times, which only become instants with a known 
> timezone.
> The fundamental issue is that dates are a human time concept and instant are 
> a machine time concept. While we can technically promote one to the other, 
> logically, it only works 100% if midnight for all dates in the system is in 
> the server timezone. 
> Every major modern platform offers a clear distinction between machine time 
> (instants) and human time (an instant with a timezone, UTC offset, etc.), 
> because we have learned the hard way that date & time handling is a 
> never-ending source of confusion and bugs. SQL, being an ancient language 
> (40+ years old), is well behind software engineering best practices; using it 
> as a guiding light is necessary for Spark to win market share, but 
> unfortunate in every other way.
> For example, Java has:
>  * java.time.LocalDate
>  * java.time.Instant
>  * java.time.ZonedDateTime
>  * java.time.OffsetDateTime
> I am not suggesting we add new data types to Spark. I am suggesting we go to 
> the heart of the matter, which is that most date vs. time handling issues are 
> the result of confusion or carelessness.
> What about introducing a new setting that makes comparisons between dates and 
> timestamps illegal, preferably with a helpful exception message?
> If it existed, I would certainly make it the default for all our clusters. 
> The minor coding convenience that comes from being able to compare dates & 
> timestamps with an automatic type promotion pales in comparison with the risk 
> of subtle bugs that remain undetected for a long time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to