Re: [DISCUSS] upper/lower of special characters

2018-09-21 Thread seancxmao
Hi, SeanAfter brief investigation, I found there are some tickets/PRs about this issue. I just didn't know that. https://issues.apache.org/jira/browse/SPARK-20156https://github.com/apache/spark/pull/17527https://github.com/apache/spark/pull/17655 I have carefully read the

Re: [DISCUSS] upper/lower of special characters

2018-09-21 Thread seancxmao
Hi, RaynoldSorry for slow response. Thanks for your suggestion. I'd like to document this in the API docs - SQL built-in functions. BTW, this is a real case we met in production, the Turkish data is from other systems through ETL. As what you mentioned, we use UDFs to avoid

Re: [DISCUSS] upper/lower of special characters

2018-09-19 Thread Sean Owen
I don't have the details in front of me, but I recall we explicitly overhauled locale-sensitive toUpper and toLower in the code for this exact situation. The current behavior should be on purpose. I believe user data strings are handled in a case sensitive way but things like reserved words in SQL

Re: [DISCUSS] upper/lower of special characters

2018-09-19 Thread Reynold Xin
I'd just document it as a known limitation and move on for now, until there are enough end users that need this. Spark is also very powerful with UDFs and end users can easily work around this using UDFs. -- excuse the brevity and lower case due to wrist injury On Tue, Sep 18, 2018 at 11:14 PM