[ https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056758#comment-16056758 ]
Cam Quoc Mach commented on SPARK-21158: --------------------------------------- Can I take this task? > SparkSQL function SparkSession.Catalog.ListTables() does not handle spark > setting for case-sensitivity > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-21158 > URL: https://issues.apache.org/jira/browse/SPARK-21158 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Environment: Windows 10 > IntelliJ > Scala > Reporter: Kathryn McClintic > Priority: Minor > Labels: easyfix, features, sparksql, windows > Original Estimate: 24h > Remaining Estimate: 24h > > When working with SQL table names in Spark SQL we have noticed some issues > with case-sensitivity. > If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the > table names in the way it was provided. This is correct. > If you set spark.sql.caseSensitive setting to be false, SparkSQL stores the > table names in lower case. > Then, we use the function sqlContext.tableNames() to get all the tables in > our DB. We check if this list contains(<"string of table name">) to determine > if we have already created a table. If case-sensitivity is turned off > (false), this function should look if the table name is contained in the > table list regardless of case. > However, it tries to look for only ones that match the lower case version of > the stored table. Therefore, if you pass in a camel or upper case table name, > this function would return false when in fact the table does exist. > The root cause of this issue is in the function > SparkSession.Catalog.ListTables() > For example: > In your SQL context - you have four tables and you have chosen to have > spark.sql.case-Sensitive=false so it stores your tables in lowercase: > carnames > carmodels > carnamesandmodels > users > dealerlocations > When running your pipeline, you want to see if you have already created the > temp join table of 'carnamesandmodels'. However, you have stored it as a > constant which reads: CarNamesAndModels for readability. > So you can use the function > sqlContext.tableNames().contains("CarNamesAndModels"). > This should return true - because we know its already created, but it will > currently return false since CarNamesAndModels is not in lowercase. > The responsibility to change the name passed into the .contains method to be > lowercase should not be put on the spark user. This should be done by spark > sql if case-sensitivity is turned to false. > Proposed solutions: > - Setting case sensitive in the sql context should make the sql context > be agnostic to case but not change the storage of the table > - There should be a custom contains method for ListTables() which converts > the tablename to be lowercase before checking > - SparkSession.Catalog.ListTables() should return the list of tables in the > input format instead of in all lowercase. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org