Github user Fokko closed the pull request at:
https://github.com/apache/spark/pull/22992
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/22992
[SPARK-24229] Update to Apache Thrift 0.10.0
The CVE detector is complaining about a vulnerability:
https://nvd.nist.gov/vuln/detail/CVE-2016-5397#vulnCurrentDescriptionTitle
## What
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@gatorsmile Are you completely sure about that? The Spark REST API `null`
values are always omitted. Where is this behaviour different?
`mapper.setSerializationInclusion(Include.NON_ABSENT
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22757
@srowen I wasn't aware of the Kinesis SDK. Thanks for patching this, looks
good.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22637#discussion_r223196616
--- Diff:
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java
---
@@ -154,6 +154,7 @@ public synchronized void start
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22637#discussion_r223187949
--- Diff:
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatchSuite.java
---
@@ -152,31 +151,27 @@ public void
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22637
Thanks @dongjoon-hyun. I've fixed the indentation issues.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22637
Valid points. Personally I'm a fan of explicit final, instead of implicit.
But that's a matter of taste :-)
---
-
To unsubscribe
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22637#discussion_r222893178
--- Diff:
common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleIntegrationSuite.java
---
@@ -133,37 +133,38 @@ private
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
@HyukjinKwon I've opened a new PR under
https://github.com/apache/spark/pull/22637. Would be nice if you can trigger
Travis ð
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/22637
Spark 25408
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
The tests passed earlier, how would it be possible that it would fail on
master?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
Thanks @srowen for pointing out the errors. Weird that it did not come up
as a merge conflict. Let me open a new PR
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Rebased onto master. @HyukjinKwon can we target this for 2.5?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
Rebased onto master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r219480345
--- Diff:
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java
---
@@ -146,16 +146,11 @@ public UserGroupInformation getHttpUGI
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
Cool, makes sense. Thanks for the clarification @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@HyukjinKwon Any idea if this has any change of getting merged? The 2.4
branch has been cut a while ago.
The number of merge conflicts are minimal in all the times I've rebased
onto master
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
@srowen Any incentive to move this forward? Or are PR's like these not
appreciated? Let me know.
Most of the changes are cosmetic, but having https://github.com/apache/spark/pull/22399/files
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r218678719
--- Diff:
common/network-common/src/test/java/org/apache/spark/network/ChunkFetchIntegrationSuite.java
---
@@ -143,37 +143,38 @@ public void releaseBuffers
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r218532105
--- Diff:
common/network-common/src/test/java/org/apache/spark/network/ChunkFetchIntegrationSuite.java
---
@@ -143,37 +143,38 @@ public void releaseBuffers
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I've rebased onto master ð
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
Addressed comments and rebased onto master, no merge conflicts.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r218078332
--- Diff:
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatchSuite.java
---
@@ -356,49 +335,45 @@ public void
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22399
Thanks for the feedback guys, I was relying too much on Scalastyle :)
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r216962424
--- Diff: core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java ---
@@ -39,30 +39,27 @@ public void setUp() throws ClassNotFoundException,
SQLException
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r216922969
--- Diff:
launcher/src/main/java/org/apache/spark/launcher/AbstractAppHandle.java ---
@@ -72,11 +74,7 @@ public void stop() {
@Override
public
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r216921874
--- Diff:
common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleIntegrationSuite.java
---
@@ -133,37 +133,37 @@ private
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/22399#discussion_r216917780
--- Diff:
common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java
---
@@ -383,7 +383,7 @@ public void testRefWithIntNaturalKey
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@HyukjinKwon All green again ð
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/22399
[SPARK-25408] Move to mode ideomatic Java8
While working on another PR, I noticed that there is quite some legacy Java
in there that can be beautified. For example the use og features from Java8
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I've rebase onto master, and fixed the test. Lets wait for the CI's opinion
of the fix.
The problem was the following, introduced in
https://github.com/apache/spark/pull/21221
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Some of the code has been introduced here:
https://github.com/apache/spark/commit/9241e1e7e66574cfafa68791771959dfc39c9684
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Some of the JSON endpoints are invalid, for example:
```
{
"info": {
"id": "driver",
"hostPort": "local
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Thanks @robert3005 But also API calls are failing, so it is not isolated to
the `RDDOperationScope`. I'm compiling Spark locally to check what the API
calls are returning
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@HyukjinKwon I've rebased onto master since the Spark 2.4 branch has been
cut.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I've rebased onto master ð
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22007
Nice! What was the issue with Travis? Feels like some caching to me :)
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22007
I don't really understand the error, can a Spark expert elaborate what's
going on here?
---
-
To unsubscribe, e-mail: reviews
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/22007
Good point @kiszk. I've just updated the files.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I've rebased, just to check if all the tests are still ok against latest
master.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@jerryshao I believe the freeze is still pending because some Scala 2.12
compatibility issues:
http://apache-spark-developers-list.1001551.n3.nabble.com/code-freeze-and-branch-cut-for-Apache-Spark-2
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/22007
[SPARK-25033] Bump Apache commons.{httpclient, httpcore}
## What changes were proposed in this pull request?
Bump the versions of Apache commons.{httpclient, httpcore} to make it
congruent
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Feels like a flaky test in the `KafkaContinuousSinkSuite`. Lets rerun :-)
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@gatorsmile Sure, just checking if it still works against recent master :)
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Rebased onto master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Rebased onto master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Now it is the case that we explicitly need to downgrade fasterxml:
```
[info] org.apache.spark.sql.streaming.StreamingQueryException: Query [id
= 99e7c44d-6f74-49f7-8d81-6e1059d13e89, runId
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@srowen I understand the situation, looking at the fasterxml dependency:
Not used by Hadoop 2.6:
https://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/dependency-analysis.html
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21596#discussion_r200102648
--- Diff: pom.xml ---
@@ -158,8 +158,8 @@
2.11.12
2.11
1.9.13
-2.6.7
-
2.6.7.1
+2.9.6
+
2.9.6
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I can do a 5 runs before and after the patch, and come up with some
statistics such as average and stddev, to get rid of the uncertainty a bit
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Personally I'm not sure how representative the tests are. I have the
feeling there is a lot of variation between the runs. Also it is not just
parsing the plain JSON, but everything around
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@MaxGekk I've reran the benchmarks as you asked. The first run to set the
baseline based on master:
https://github.com/apache/spark/pull/21596/commits/32ec12bdfe1fd9377c0f4037942da9e2a49e8f9a
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21596#discussion_r199141020
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
---
@@ -25,8 +25,14 @@ import org.apache.spark.util
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
It took some time, but here it is, current master:
```
MacBook-Pro-van-Fokko:dist fokkodriesprong$ cat /tmp/spark-master.txt
Preparing data for benchmarking ...
Running benchmark: JSON
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21596#discussion_r198112472
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
---
@@ -25,8 +25,13 @@ import org.apache.spark.util
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I would preferably get this in Spark 2.4 :-)
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21596#discussion_r197715689
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
---
@@ -25,8 +25,13 @@ import org.apache.spark.util
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21596#discussion_r197714976
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
---
@@ -112,12 +118,13 @@ object JSONBenchmarks
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21596#discussion_r197707147
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
---
@@ -25,8 +25,13 @@ import org.apache.spark.util
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Thanks @MaxGekk for pointing out. I've also added the command to the
comments in the test for future reference.
I've ran the benchmark. The inferring became a bit slower (`0.85x`
relative
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
@MaxGekk Any pointers on how to build this test-jar containing spark sql
test-classes?
---
-
To unsubscribe, e-mail: reviews
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
So now it trims too much, we get:
```
[ {
"id" : "local-1422981759269",
"name" : "Spark shell",
"attempts" : [ {
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21596#discussion_r197604577
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
---
@@ -244,6 +244,13 @@ class
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I will first fix the tests, and then assert the performance if you agree.
Changing the annotations might impact the performance.
Regarding the difference between `NON_NULL`, `NON_ABSENT
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
When looking at the history server, we have a similar issue. From at the
list command
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92231/testReport
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Ok, I've found the issue:
https://github.com/FasterXML/jackson-module-scala/issues/325
Changing this to `NON_ABSENT` should fix
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Oops, a dangling file. Sorry for triggering so many builds guys.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Hmm, I see a relevant failing test:
```
org.scalatest.exceptions.TestFailedException:
"...430","name":"scope1"[,"parent":null]}
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I've pushed the changes, the tests are still running locally.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
Yes, let me add it.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
I could not get the tests working locally. :-) Let me give it another try.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21596
This was more than a year ago, we should eventually upgrade..
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/21597
@HyukjinKwon Ok, some Apache projects are really strict with the Jira
tickets.
@maropu I've updated the commit
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/21597
[SPARK-24603] Fix findTightestCommonType reference in comments
findTightestCommonTypeOfTwo has been renamed to findTightestCommonType
## What changes were proposed in this pull request
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/21596
[SPARK-24601] Bump Jackson version
Hi all,
Jackson is incompatible with upstream versions, therefore bump the Jackson
version to a more recent one. I bumped into some issues with Azure
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/21528
[SPARK-24520] Double braces in documentations
There are double braces in the markdown, which break the link.
## What changes were proposed in this pull request?
(Please fill
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/20966
I think we should start updating Guava in small steps. In some of the
application that I'm using, we're up to date at `24.1-jre`. The 14.0.1 that is
still from March 2013. I took the liberty to check
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/20966
[SPARK-23854] Update Guava to 16.0.1
Currently Spark is still on Guava 14.0.1, and therefore I would like to
bump the version to 16.0.1.
Babysteps are important here, because we don't want
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/20057
Gentle ping @gatorsmile @dongjoon-hyun ð
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/20057
Hi @gatorsmile, thanks for putting it to the tests. The main reason why I
personally dislike Sqoop is:
- **Legacy.** The old map-reduce should be buried in the upcoming years. As
a data
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/20057
We're in the process of integrating Spark in Airflow, and support for the
`cascadeTruncate` is required to make this succeed. First steps are here:
https://github.com/apache/incubator-airflow/pull
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/20057
Any idea when this will be merged into master? We could use this since we
are ditching sqoop ð
---
-
To unsubscribe, e-mail
Github user Fokko commented on the issue:
https://github.com/apache/spark/pull/20103
Thanks @srowen, I've ran the command and amended the commit. Cheers
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/20103
[SPARK-22919] Bump httpclient versions
Hi all,
I would like to bump the PATCH versions of both the Apache httpclient
Apache httpcore. I use the SparkTC Stocator library for connecting
Github user Fokko commented on the pull request:
https://github.com/apache/spark/pull/10839#issuecomment-201822435
@mengxr did you had a chance to look at the updated version? I also
extended the test to check the conversion to dense/sparse vectors.
---
If your project is set up
Github user Fokko commented on the pull request:
https://github.com/apache/spark/pull/10839#issuecomment-198545136
@mengxr I've updated the code according to your PR :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user Fokko commented on the pull request:
https://github.com/apache/spark/pull/10839#issuecomment-197222971
Nice work, as soon as the PR will be merged I will update the code
accordingly.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user Fokko commented on the pull request:
https://github.com/apache/spark/pull/10839#issuecomment-197063381
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user Fokko commented on the pull request:
https://github.com/apache/spark/pull/10839#issuecomment-197057413
I've improved the PR based on the feedback. Beside that I've also updated
the benchmark:
https://github.com/Fokko/BlockMatrixToIndexedRowMatrix
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/spark/pull/10839#discussion_r56143177
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
---
@@ -268,8 +268,26 @@ class BlockMatrix @Since("
GitHub user Fokko opened a pull request:
https://github.com/apache/spark/pull/10839
[SPARK-12869] Implemented an improved version of the toIndexedRowMatrix
Hi guys,
I've implemented an improved version of the `toIndexedRowMatrix` function
on the `BlockMatrix`. I needed
92 matches
Mail list logo