Re: data source api v2 refactoring

2018-09-07 Thread Thakrar, Jayesh
Ryan et al, Wondering if the existing Spark based data sources (e.g. for HDFS, Kafka) have been ported to V2. I remember reading threads where there were discussions about the inefficiency/overhead of converting from Row to InternalRow that was preventing certain porting effort etc. I ask

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-07 Thread Russell Spitzer
That's my understanding :) doExecute is for non-codegen while doProduce and Consume are for generating code On Fri, Sep 7, 2018 at 2:59 PM Jacek Laskowski wrote: > Hi Devs, > > Sorry for bothering you with my questions (and concerns), but I really > need to understand this piece of code (= my

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-07 Thread Jacek Laskowski
Hi Devs, Sorry for bothering you with my questions (and concerns), but I really need to understand this piece of code (= my personal challenge :)) Is this true that SparkPlan.doExecute (to "execute" a physical operator) is only used when whole-stage code gen is disabled (which is not by

Re: data source api v2 refactoring

2018-09-07 Thread Ryan Blue
There are a few v2-related changes that we can work in parallel, at least for reviews: * SPARK-25006, #21978 : Add catalog to TableIdentifier - this proposes how to incrementally add multi-catalog support without breaking existing code paths *

Re: Branch 2.4 is cut

2018-09-07 Thread shane knapp
++joshrosen (thanks for the help w/deploying the jenkins configs) the basic 2.4 builds are deployed and building! i haven't created (a) build(s) yet for scala 2.12... i'll be coordinating this w/the databricks folks next week. On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun wrote: > Thank

Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-07 Thread Jacek Laskowski
Hi Spark Devs, I really need your help understanding the relationship between HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap. While exploring UnsafeFixedWidthAggregationMap and how it's used I've noticed that it's for HashAggregateExec and

Re: Branch 2.4 is cut

2018-09-07 Thread Dongjoon Hyun
Thank you, Shane! :D Bests, Dongjoon. On Fri, Sep 7, 2018 at 9:51 AM shane knapp wrote: > i'll try and get to the 2.4 branch stuff today... > >

Re: Branch 2.4 is cut

2018-09-07 Thread shane knapp
i'll try and get to the 2.4 branch stuff today... On Fri, Sep 7, 2018 at 8:15 AM, Wenchen Fan wrote: > I've reached to Shane, but he is busy recently. I'll figure it out with > Josh soon. Will post update to this thread later. > > Thanks, > Wenchen > > On Fri, Sep 7, 2018 at 11:01 PM Sean Owen

Re: Branch 2.4 is cut

2018-09-07 Thread Sean Owen
I'm just using 3.0 but would not hurt to create 2.5.0. If 2.5 doesn't happen then we just move those to 3.0.0 later. On Fri, Sep 7, 2018, 9:40 AM Holden Karau wrote: > Was doing my weekly code review > and went to close an issue, > but since it

Re: Branch 2.4 is cut

2018-09-07 Thread Holden Karau
Was doing my weekly code review and went to close an issue, but since it wasn't one of the categories listed wasn't going to merge into the 2.4 branch but we need a new version in JIRA for us to close issues to that are going to merge into master but

Re: Spark data quality bug when reading parquet files from hive metastore

2018-09-07 Thread Long, Andrew
Thanks Fokko, I will definitely take a look at this. Cheers Andrew From: "Driesprong, Fokko" Date: Friday, August 24, 2018 at 2:39 AM To: "reubensaw...@hotmail.com" Cc: "dev@spark.apache.org" Subject: Re: Spark data quality bug when reading parquet files from hive metastore Hi Andrew,

Re: Branch 2.4 is cut

2018-09-07 Thread Wenchen Fan
I've reached to Shane, but he is busy recently. I'll figure it out with Josh soon. Will post update to this thread later. Thanks, Wenchen On Fri, Sep 7, 2018 at 11:01 PM Sean Owen wrote: > CC Shane who might have the permission to do so. > > If master is going to be 3.0, we can remove the

Re: Branch 2.4 is cut

2018-09-07 Thread Sean Owen
CC Shane who might have the permission to do so. If master is going to be 3.0, we can remove the Hadoop 2.6 builds soon for master, note. We could remove them now, honestly. On Thu, Sep 6, 2018 at 10:09 AM Dongjoon Hyun wrote: > Great for branch cut and Scala 2.12 build. > > We also need to

Re: data source api v2 refactoring

2018-09-07 Thread Wenchen Fan
Hi Ryan, You are right that the `LogicalWrite` mirrors the read side API. I just don't have a good naming yet, and write side changes will be a different PR. Hi Hyukjin, That's my expectation, otherwise we keep rebasing the refactor PR and never get it done. On Fri, Sep 7, 2018 at 3:02 PM

Re: data source api v2 refactoring

2018-09-07 Thread Hyukjin Kwon
BTW, do we hold Datasource V2 related PRs for now until we finish this refactoring just for clarification? 2018년 9월 7일 (금) 오전 12:52, Ryan Blue 님이 작성: > Wenchen, > > I'm not really sure what you're proposing here. What is a `LogicalWrite`? > Is it something that mirrors the read side in your PR?