Hi everyone,
I am working towards making Spark's Sort Merge join in par with Hive's
Sort-Merge-Bucket join to use sorted. So far I have identified these main
items to be addressed:
1. Make query planner to use `sorted`ness information for sort merge join
(SPARK-15453, SPARK-17271)
2.
because we use the same rule to parse top level and nested data
>> fields. For example:
>>
>> create table tbl_x(
>> id bigint,
>> nested struct<col1:string,col2:string>
>> )
>>
>> Shows both syntaxes. We should split this rule in a top-level and nes
Is there any reason why Spark SQL supports "" ":" "" while specifying columns ? eg. sql("CREATE TABLE t1 (column1:INT)")
works fine.
Here is relevant snippet in the grammar [0]:
```
colType
: identifier ':'? dataType (COMMENT STRING)?
;
```
I do not see MySQL[1], Hive[2], Presto[3] and
e). It would be better
> (safer) to move the output partitioning definition into each of the
> operator and remove it from UnaryExecNode.
>
> Would you be interested in submitting the patch?
>
>
>
> On Wed, Oct 12, 2016 at 10:26 AM, Tejas Patil <tejas.patil...@gmail.com>
&g
See
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L80
Project operator preserves child's sort ordering but for output
partitioning, it does not. I don't see any way projection would alter the
partitioning of the
017 9:58 pm, "Matei Zaharia" <matei.zaha...@gmail.com> wrote:
>
> Hi all,
>
> The Spark PMC recently added Tejas Patil as a committer on the
> project. Tejas has been contributing across several areas of Spark for
> a while, focusing especially on scalability i
There is a JIRA for making Map types orderable :
https://issues.apache.org/jira/browse/SPARK-18134 Given that this is a
non-trivial change, it will take time.
On Sat, Jan 13, 2018 at 9:50 PM, ckhari4u wrote:
> Wan, Thanks a lot,! I see the issue now.
>
> Do we have any