Tejas Patil created SPARK-19618: ----------------------------------- Summary: Inconsistency wrt max. buckets allowed from Dataframe API vs SQL Key: SPARK-19618 URL: https://issues.apache.org/jira/browse/SPARK-19618 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Tejas Patil
High number of buckets is allowed while creating a table via SQL query: {code} sparkSession.sql(""" CREATE TABLE bucketed_table(col1 INT) USING parquet CLUSTERED BY (col1) SORTED BY (col1) INTO 147483647 BUCKETS """) sparkSession.sql("DESC FORMATTED bucketed_table").collect.foreach(println) .... [Num Buckets:,147483647,] [Bucket Columns:,[col1],] [Sort Columns:,[col1],] .... {code} Trying the same via dataframe API does not work: {code} > df.write.format("orc").bucketBy(147483647, > "j","k").sortBy("j","k").saveAsTable("bucketed_table") java.lang.IllegalArgumentException: requirement failed: Bucket number must be greater than 0 and less than 100000. at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:293) at org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:291) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.DataFrameWriter.getBucketSpec(DataFrameWriter.scala:291) at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:429) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:410) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:365) ... 50 elided {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org