Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/23173#discussion_r238077135
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
---
@@ -171,15 +171,21 @@ private[csv] class
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/23173#discussion_r238068538
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
---
@@ -171,15 +171,21 @@ private[csv] class
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/23173#discussion_r237716913
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/OutputWriter.scala
---
@@ -57,6 +57,9 @@ abstract class
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/23173#discussion_r237687091
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1987,6 +1987,18 @@ class CSVSuite extends
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/23173#discussion_r237663865
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1987,6 +1987,18 @@ class CSVSuite extends
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/23052
it is pretty common for us to write empty dataframe to parquet and later
read it back in
same for writing to csv with header and reading it back in (with type
inference disabled, we assume
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/23173#discussion_r237579324
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/OutputWriter.scala
---
@@ -57,6 +57,9 @@ abstract class
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/23173
i was not aware of SPARK-15473. thanks. let me look at @HyukjinKwon pullreq
and mark my jira as a duplicate.
---
-
To
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/23173
[SPARK-26208][SQL] add headers to empty csv files when header=true
## What changes were proposed in this pull request?
Add headers to empty csv files when header=true, because
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/21273
it would provide a workaround i think, yes.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/21273
@HyukjinKwon see https://github.com/apache/spark/pull/22312
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/22312
[SPARK-17916][SQL] Fix new behavior when quote is set and fix old behavior
when quote is unset
## What changes were proposed in this pull request?
1) Set nullValue to quoted empty
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/21273
i would suggest at least that when the quote character is changed that the
empty value should change accordingly. an empty value of ```""``` makes no
sense if the quote charac
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/22123#discussion_r211309642
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1603,6 +1603,39 @@ class CSVSuite extends
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/21345
we are testing spark 2.4 internally and had some unit tests break because
of this change i believe.
i am not suggesting this should be changed or undone, just wanted to point
out that
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/21273
@HyukjinKwon see the jira for the example code that reproduces the issue.
let me know if you need anything else. best, koert
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/21273
to summarize my findings from jira:
this breaks any usage without quoting. for example we remove all characters
from our values that need to be quoted (delimiters, newlines) so we know we
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/22123
```
Test Result (1 failure / +1)
org.apache.spark.sql.streaming.FlatMapGroupsWithStateSuite.flatMapGroupsWithState
- streaming with processing time timeout - state format version 1
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/22123#discussion_r210801081
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1603,6 +1603,44 @@ class CSVSuite extends
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/22123
[SPARK-25134][SQL] Csv column pruning with checking of headers throws
incorrect error
## What changes were proposed in this pull request?
When column pruning is turned on the
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/21296
if i do not select a schema (and i use inferSchema), and i do a select for
only a few column, does this push down the column selection into the reading of
data (for schema inference and for
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/18714
@cloud-fan i created
[SPARK-24860](https://issues.apache.org/jira/browse/SPARK-24860) for this
---
-
To unsubscribe, e
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/21818
[SPARK-24860][SQL] Support setting of partitionOverWriteMode in output
options for writing DataFrame
## What changes were proposed in this pull request?
Besides spark setting
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/18714
@cloud-fan OK, that works just as well
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/18714
should this be exposed per write instead of as a global variable?
e.g.
dataframe.write.csv.partitionMode(Dynamic).partitionBy(...).save
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/609
```OPTS+=" --driver-java-options \"-Da=b -Dx=y\""```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If y
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/609
@ganeshm25 it seems to work in newer spark versions. i havent tried in
spark 1.4.2. however its still very tricky to get it right and i would prefer a
simpler solution.
---
If your project is
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/17660
@cloud-fan switching to lazy vals to avoid these predicates being evaluated
when they are not used seems to work.
so i think this is a better (more targeted) solution for now, and i removed
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/17660
I see. let me check if making leftHasNonNullPredicate and
rightHasNonNullPredicate lazy solves it then
On Apr 17, 2017 23:44, "Wenchen Fan" wrote:
> I t
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/17660#discussion_r111842598
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -124,7 +125,15 @@ case class EliminateOuterJoin(conf
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/17660
[SPARK-20359][SQL] catch NPE in EliminateOuterJoin optimization
catch NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm
NPE is no longer thrown
## What changes were
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/17639
@cloud-fan thanks for doing this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/16889
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/16889
i am going to close this for now since i dont think this is an optimal
solution
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/16889
[SPARK-17668][SQL] Use Expressions for conversions to/from user types in
UDFs
## What changes were proposed in this pull request?
do not merge
this is a first attempt at trying
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/9565
i think this would be very helpful. the difference in behaviour of scala
udfs and scala functions used in dataset transformations is a constant source
of confusion for my users.
for
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/16479
i will just copy the conversion code over for now thx
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/16479
how "internal" are these interfaces really? every time a change like this
is made spark-avro breaks
---
If your project is set up for it, you can reply to this email and have
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/16143
thanks for getting this fixed so fast
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15979
admittedly the result looks weird. it really should be:
+---++
|key|count(1)|
+---++
| null| 1|
| [1,1]| 1
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15979
spark 2.0.x does not have mapValues. but this works:
scala> Seq(("a", Some((1, 1))), ("a",
None)).toDS.groupByKey(_._2).count.show
+---++
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15979
Yes it worked before
On Dec 4, 2016 02:33, "Wenchen Fan" wrote:
> val x: Dataset[String, Option[(String, String)]] = ...
> x.groupByKey(_._1).mapValues(_
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15979
this means anything that uses an encoder can no longer use Option[_ <:
Product].
encoders are not just used for the top level Dataset creation.
Dataset.groupByKey[K] requires
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/15979#discussion_r90770855
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
---
@@ -47,16 +47,26 @@ object ExpressionEncoder
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/15979#discussion_r90770824
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
---
@@ -47,16 +47,26 @@ object ExpressionEncoder
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15918
It can be done with shapeless (which perhaps uses macros under hood, I
don't know).
On Dec 1, 2016 19:56, "Michael Armbrust" wrote:
I don't thi
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15918
if we do a flag i would also prefer it if the current implicits are more
narrow if the flag is not set, if possible.
---
If your project is set up for it, you can reply to this email and have
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15918
@srowen and @rxin what is the default behavior that is changed here? i see
a current situation where an implicit encoder is provided that simply cannot
handle the task at hand and this leads
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
if they chain like that then i think i know how to do the optimization.
but do they? look for example at dataset.groupByKey(...).mapValues(...)
Dataset[T].groupByKey[K] uses
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
@cloud-fan that makes sense to me, but its definitely not a quick win to
create that optimization.
let me think about it some more
---
If your project is set up for it, you can reply to
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
@cloud-fan i can try to optimize
```grouped.mapValues(...).mapValues(...)``` but its a bit of an anti-pattern
(there should be no need to do mapValues twice) so i dont think there is much
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
@rxin i can give it a try (the optimizer rule)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/15382#discussion_r83921525
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -741,7 +741,7 @@ private[sql] class SQLConf extends Serializable
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15382
i don't think there is such a thing as a HDFS working directory, but that
probably means it just uses the home dir on hdfs (/user/) for any
relative paths
---
If your project is set u
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/15382
i think working dir makes more sense than home dir. but could this catch
people by surprise because we now expect write permission in the working dir?
---
If your project is set up for it
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13868#discussion_r82216818
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -55,7 +56,7 @@ object SQLConf {
val WAREHOUSE_PATH
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan i thought about this a little more, and my suggested changes to
the Aggregator api does not allow one to use a different encoder when applying
a typed operation on Dataset. so i do
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/14576#discussion_r75186632
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala
---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/14576#discussion_r75152186
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala
---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/14576#discussion_r74361702
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/14576#discussion_r74316735
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/14576#discussion_r74314375
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/14576#discussion_r74313912
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/14222
there is a usefulness to this `ReduceAggregator` beyond `.reduceGroups`.
basically you can take any Aggregator without a zero and turn it into a valid
Aggregator, with the caveat being that
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13526#discussion_r71042267
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
---
@@ -312,6 +312,17 @@ class DatasetSuite extends QueryTest with
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13526#discussion_r71041725
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -65,6 +65,46 @@ class KeyValueGroupedDataset[K, V] private
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13532#discussion_r69397207
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala ---
@@ -305,4 +305,13 @@ class DatasetAggregatorSuite extends
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13933
For parquet, json etc. path not being put in options is not an issue since
they don't retrieve it from the options
On Jun 29, 2016 2:31 AM, "Xiao Li" wrote:
&g
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13727#discussion_r68672691
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13727#discussion_r68645998
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13727#discussion_r68624316
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/8416
this patch should not have broken reading files that include comma.
i also added unit test for this:
https://github.com/apache/spark/pull/8416/files#diff
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
could we "rewind"/undo the append for the key and change it to a map that
inserts new values and key? so remove one append and replace it with another
operation?
---
If your proj
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
the tricky part with that is that (ds: Dataset[(K,
V)]).groupBy(_._1).mapValues(_._2) should return a
KeyValueGroupedDataset[K, V]
On Tue, Jun 7, 2016 at 8:22 PM, Wenchen Fan
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
```
scala> val x = Seq(("a", 1), ("b", 2)).toDS
x: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
scala> x.groupByKey(_._1).ma
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
ok i will study the physical plans for both and try to understand why one
would be slower
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
can you explain a bit what is inefficient and would need an optimizer rule?
is it mapValues being called twice? once for the key and then for the new
values?
thanks!
---
If your
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
see this conversation:
https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3ccaaswr-7kqfmxd_cpr-_wdygafh+rarecm9olm5jkxfk14fc...@mail.gmail.com%3E
mapGroups is not a
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13532#discussion_r65986613
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -51,7 +52,8 @@ object
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13526#discussion_r65972115
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -65,6 +65,44 @@ class KeyValueGroupedDataset[K, V] private
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/13532
[SPARK-15204][SQL] improve nullability inference for Aggregator
## What changes were proposed in this pull request?
TypedAggregateExpression sets nullable based on the schema of the
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
for example with this branch you can do:
```
val df3 = Seq(("a", "x", 1), ("a", "y", 3), ("b", "x", 3)).toDF("i"
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
well that was sort of what i was trying to achieve. the unit tests i added
were for using Aggregator for untyped grouping(```groupBy```).
and i think for it to be useful within that
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/13512
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
If Aggregator is designed for typed Dataset only then that is a bit of a
shame, because its a elegant and generic api that should be useful for
DataFrame too. this causes fragmentation
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/13526
[SPARK-15780][SQL] Support mapValues on KeyValueGroupedDataset
## What changes were proposed in this pull request?
Add mapValues to KeyValueGroupedDataset
## How was this
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan i am running into some trouble updating my branch to the latest
master. i get errors in tests due to Analyzer.validateTopLevelTupleFields
the issue seems to be that in
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan from the (added) unit tests:
```
val df2 = Seq("a" -> 1, "a" -> 3, "b" -> 3).toDF("i", "j")
checkAnswer(df2.grou
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
**[Test build #5 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
for PR 13512 at commit
[`077f782`](https://github.com/apache/spark
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/5/
Test FAILed
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
**[Test build #5 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
for PR 13512 at commit
[`077f782`](https://github.com/apache/spark
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/13512
[SPARK-15769][SQL] Add Encoder for input type to Aggregator
## What changes were proposed in this pull request?
Aggregator also has an Encoder for the input type
## How was this
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/11980
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/13012#issuecomment-218053856
blackbox transformations infer nullable=false when you return a primitive.
for example:
```
scala> sc.parallelize(List(1,2,3)).toDS.map(i => i * 2).
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/12877#issuecomment-216678197
yup needs to be transient, will fix
On Tue, May 3, 2016 at 5:58 PM, andrewor14 wrote:
> I think it's OK for it to be lazy; just w
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/12877#issuecomment-216675245
if a SparkSession sits inside a Dataset does that mean _wrapped is always
already initialized (because you cannot have a Dataset without a
SparkContext)? if
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/12877#issuecomment-216670925
i made it lazy val since SparkSession.wrapped is effectively lazy too:
protected[sql] def wrapped: SQLContext = {
if (_wrapped == null
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/12877#issuecomment-216670423
oh since since sparkSession is just a normal val i guess it can also be
On Tue, May 3, 2016 at 5:25 PM, andrewor14 wrote:
> Looks good otherw
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/12877
[SPARK-15097][SQL] make Dataset.sqlContext a stable identifier for imports
## What changes were proposed in this pull request?
Make Dataset.sqlContext a lazy val so that its a stable
1 - 100 of 177 matches
Mail list logo