Tejas Patil created SPARK-19618:
-----------------------------------

             Summary: Inconsistency wrt max. buckets allowed from Dataframe API 
vs SQL
                 Key: SPARK-19618
                 URL: https://issues.apache.org/jira/browse/SPARK-19618
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Tejas Patil


High number of buckets is allowed while creating a table via SQL query:

{code}
sparkSession.sql("""
CREATE TABLE bucketed_table(col1 INT) USING parquet 
CLUSTERED BY (col1) SORTED BY (col1) INTO 147483647 BUCKETS
""")

sparkSession.sql("DESC FORMATTED bucketed_table").collect.foreach(println)
....
[Num Buckets:,147483647,]
[Bucket Columns:,[col1],]
[Sort Columns:,[col1],]
....
{code}

Trying the same via dataframe API does not work:

{code}
> df.write.format("orc").bucketBy(147483647, 
> "j","k").sortBy("j","k").saveAsTable("bucketed_table")

java.lang.IllegalArgumentException: requirement failed: Bucket number must be 
greater than 0 and less than 100000.
  at scala.Predef$.require(Predef.scala:224)
  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:293)
  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:291)
  at scala.Option.map(Option.scala:146)
  at 
org.apache.spark.sql.DataFrameWriter.getBucketSpec(DataFrameWriter.scala:291)
  at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:429)
  at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:410)
  at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:365)
  ... 50 elided
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to