Fwd: CRAN submission SparkR 2.3.3

2019-02-24 Thread Shivaram Venkataraman
FYI here is the note from CRAN from submitting 2.3.3. There were some minor issues with the package description file in our CRAN submission. We are discussing with the CRAN team about this and also Felix has a patch to address this for upcoming releases. One thing I was wondering is that if there

Re: [DISCUSS][SQL][PySpark] Column name support for SQL functions

2019-02-24 Thread Reynold Xin
The challenge with the Scala/Java API in the past is that when there are multipe parameters, it'd lead to an explosion of function overloads.  On Sun, Feb 24, 2019 at 3:22 PM, Felix Cheung < felixcheun...@hotmail.com > wrote: > > I hear three topics in this thread > > > 1. I don’t think we s

Re: [DISCUSS] SPIP: Relational Cache

2019-02-24 Thread Reynold Xin
How is this different from materialized views? On Sun, Feb 24, 2019 at 3:44 PM Daoyuan Wang wrote: > Hi everyone, > > We'd like to discuss our proposal of Spark relational cache in this > thread. Spark has native command for RDD caching, but the use of CACHE > command in Spark SQL is limited, as

[DISCUSS] SPIP: Relational Cache

2019-02-24 Thread Daoyuan Wang
Hi everyone, We'd like to discuss our proposal of Spark relational cache in this thread. Spark has native command for RDD caching, but the use of CACHE command in Spark SQL is limited, as we cannot use the cache cross session, as well as we have to rewrite queries by ourselves to make use of e

Re: [DISCUSS][SQL][PySpark] Column name support for SQL functions

2019-02-24 Thread Felix Cheung
I hear three topics in this thread 1. I don’t think we should remove string. Column and string can both be “type safe”. And I would agree we don’t *need* to break API compatibility here. 2. Gaps in python API. Extending on #1, definitely we should be consistent and add string as param where it

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-24 Thread Sean Owen
Sure, I don't read anyone making these statements though? Let's assume good intent, that "foo should happen" as "my opinion as a member of the community, which is not solely up to me, is that foo should happen". I understand it's possible for a person to make their opinion over-weighted; this whole

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-24 Thread Mark Hamstra
> > I’m not quite sure what you mean here. > I'll try to explain once more, then I'll drop it since continuing the rest of the discussion in this thread is more important than getting side-tracked. There is nothing wrong with individuals advocating for what they think should or should not be in S

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-24 Thread Ryan Blue
Thanks to Matt for his philosophical take. I agree. The intent is to set a common goal, so that we work toward getting v2 in a usable state as a community. Part of that is making choices to get it done on time, which we have already seen on this thread: setting out more clearly what we mean by “DS

Re: [DISCUSS][SQL][PySpark] Column name support for SQL functions

2019-02-24 Thread Sean Owen
I just commented on the PR -- I personally don't think it's worth removing support for, say, max("foo") over max(col("foo")) or max($"foo") in Scala. We can make breaking changes in Spark 3 but this seems like it would unnecessarily break a lot of code. The string arg is more concise in Python and

[DISCUSS][SQL][PySpark] Column name support for SQL functions

2019-02-24 Thread André Mello
# Context This comes from [SPARK-26979], which became PR #23879 and then PR #23882. The following reflects all the findings made so far. # Description Currently, in the Scala API, some SQL functions have two overloads, one taking a string that names the column to be operated on, the other taking