Questions about bucketing in Spark

2016-08-31 Thread Tejas Patil
Hi everyone, I am working towards making Spark's Sort Merge join in par with Hive's Sort-Merge-Bucket join to use sorted. So far I have identified these main items to be addressed: 1. Make query planner to use `sorted`ness information for sort merge join (SPARK-15453, SPARK-17271) 2.

Re: [question] Why Spark SQL grammar allows : ?

2016-09-29 Thread Tejas Patil
because we use the same rule to parse top level and nested data >> fields. For example: >> >> create table tbl_x( >> id bigint, >> nested struct<col1:string,col2:string> >> ) >> >> Shows both syntaxes. We should split this rule in a top-level and nes

[question] Why Spark SQL grammar allows : ?

2016-09-29 Thread Tejas Patil
Is there any reason why Spark SQL supports "" ":" "" while specifying columns ? eg. sql("CREATE TABLE t1 (column1:INT)") works fine. Here is relevant snippet in the grammar [0]: ``` colType : identifier ':'? dataType (COMMENT STRING)? ; ``` I do not see MySQL[1], Hive[2], Presto[3] and

Re: `Project` not preserving child partitioning ?

2016-10-12 Thread Tejas Patil
e). It would be better > (safer) to move the output partitioning definition into each of the > operator and remove it from UnaryExecNode. > > Would you be interested in submitting the patch? > > > > On Wed, Oct 12, 2016 at 10:26 AM, Tejas Patil <tejas.patil...@gmail.com> &g

`Project` not preserving child partitioning ?

2016-10-12 Thread Tejas Patil
See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L80 Project operator preserves child's sort ordering but for output partitioning, it does not. I don't see any way projection would alter the partitioning of the

Re: Welcoming Tejas Patil as a Spark committer

2017-10-02 Thread Tejas Patil
017 9:58 pm, "Matei Zaharia" <matei.zaha...@gmail.com> wrote: > > Hi all, > > The Spark PMC recently added Tejas Patil as a committer on the > project. Tejas has been contributing across several areas of Spark for > a while, focusing especially on scalability i

Re: Distinct on Map data type -- SPARK-19893

2018-01-16 Thread Tejas Patil
There is a JIRA for making Map types orderable : https://issues.apache.org/jira/browse/SPARK-18134 Given that this is a non-trivial change, it will take time. On Sat, Jan 13, 2018 at 9:50 PM, ckhari4u wrote: > Wan, Thanks a lot,! I see the issue now. > > Do we have any