+1 (non-binding), looks good
Tested on RHEL 7.2, 7.3, CentOS 7.2, Ubuntu 14 04 and 16 04, SUSE 12, x86,
IBM Linux on Power and IBM Linux on Z (big-endian)
No problems with latest IBM Java, Hadoop 2.7.3 and Scala 2.11.8, no
performance concerns to report either (spark-sql-perf and HiBench)
Buil
Hi, I suggest HiBench and SparkSqlPerf, HiBench features many benchmarks
within it that exercise several components of Spark (great for stressing
core, sql, MLlib capabilities), SparkSqlPerf features 99 TPC-DS queries
(stressing the DataFrame API and therefore the Catalyst optimiser), both
work
+1 (non-binding)
Functional: looks good, tested with OpenJDK 8 (1.8.0_111) and IBM's latest
SDK for Java (8 SR3 FP21).
Tests run clean on Ubuntu 16 04, 14 04, SUSE 12, CentOS 7.2 on x86 and IBM
specific platforms including big-endian. On slower machines I see these
failing but nothing to be co
I've never seen the ReplSuite test OoMing with IBM's latest SDK for Java
but have always noticed this particular test failing with the following
instead:
java.lang.AssertionError: assertion failed: deviation too large:
0.8506807397223823, first size: 180392, second size: 333848
This particular
+1 (non-binding)
Build: mvn -T 1C -Psparkr -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver
-DskipTests clean package
Test: mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver
-Dtest.exclude.tags=org.apache.spark.tags.DockerTest -fn test
Test options: -Xss2048k -Dspark.buffer.pageSize=1048576 -Xmx4
I'm seeing the same failure but manifesting itself as a stackoverflow,
various operating systems and architectures (RHEL 71, CentOS 72, SUSE 12,
Ubuntu 14 04 and 16 04 LTS)
Build and test options:
mvn -T 1C -Psparkr -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver
-DskipTests clean package
mvn -
for 1.6.2 vs 2.0
with HiBench (large profile, 25g executor memory, 4g driver), again we
will be carefully checking how these benchmarks are being run and what
difference the options and configurations can make
Cheers,
From: Ted Yu
To: Adam Roberts/UK/IBM@IBMGB
Cc: Michael Al
Agreed, this is something that we do regularly when producing our own
Spark distributions in IBM and so it will be beneficial to share updates
with the wider community, so far it looks like Spark 1.6.2 is the best out
of the box on spark-perf and HiBench (of course this may vary for real
worklo
WholeStageCodegen: on I think, we turned it off when fixing a bug
offHeap.enabled: false
offHeap.size: 0
Cheers,
From: Michael Allman
To: Adam Roberts/UK/IBM@IBMGB
Cc: dev
Date: 08/07/2016 17:05
Subject:Re: Spark 2.0.0 performance; potential large Spark core
regression
er 8g, executor memory 16g, Kryo, 0.66
memory fraction, 100 trials
We can post the 1.6.2 comparison early next week, running lots of
iterations over the weekend once we get the dedicated time again
Cheers,
From: Michael Allman
To: Adam Roberts/UK/IBM@IBMGB
Cc: dev
Date: 08/07
Hi, we've been testing the performance of Spark 2.0 compared to previous
releases, unfortunately there are no Spark 2.0 compatible versions of
HiBench and SparkPerf apart from those I'm working on (see
https://github.com/databricks/spark-perf/issues/108)
With the Spark 2.0 version of SparkPerf
Hi, sharing what I discovered with PySpark too, corroborates with what
Amit notices and also interested in the pipe question:
h
ttps://mail-archives.apache.org/mod_mbox/spark-dev/201603.mbox/%3c201603291521.u2tflbfo024...@d06av05.portsmouth.uk.ibm.com%3E
// Start a thread to feed the process inp
s file is super important), so the
emails here will at least point people there.
Cheers,
From: Adam Roberts/UK/IBM@IBMGB
To: dev
Date: 14/06/2016 12:18
Subject:Databricks SparkPerf with Spark 2.0
Hi, I'm working on having "SparkPerf" (
https://github.com/d
Hi, I'm working on having "SparkPerf" (
https://github.com/databricks/spark-perf) run with Spark 2.0, noticed a
few pull requests not yet accepted so concerned this project's been
abandoned - it's proven very useful in the past for quality assurance as
we can easily exercise lots of Spark functi
Hi,
Given a very simple test that uses a bigger version of the pom.xml file in
our Spark home directory (cat with a bash for loop into itself so it
becomes 100 MB), I've noticed with larger heap sizes it looks like we have
more RDDs reported as being cached, is this intended behaviour? What
e
initely not supported for shorts/ints/longs.
if these tests continue to pass then I think the Spark tests don't
exercise unaligned memory access, cheers
From: Ted Yu
To: Adam Roberts/UK/IBM@IBMGB
Cc: "dev@spark.apache.org"
Date: 15/04/2016 17:35
Subject:
Ted, yeah with the forced true value the tests in that suite all pass and
I know they're being executed thanks to prints I've added
Cheers,
From: Ted Yu
To: Adam Roberts/UK/IBM@IBMGB
Cc: "dev@spark.apache.org"
Date: 15/04/2016 16:43
Subject:
anyway for experimenting", and the tests pass.
No other problems on the platform (pending a different pull request).
Cheers,
From: Ted Yu
To: Adam Roberts/UK/IBM@IBMGB
Cc: "dev@spark.apache.org"
Date: 15/04/2016 15:32
Subject:Re: BytesToBytes and un
Hi, I'm testing Spark 2.0.0 on various architectures and have a question,
are we sure if
core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java
really is attempting to use unaligned memory access (for the
BytesToBytesMapOffHeapSuite tests specifically)?
Our JDKs on z
Hi, I'm interested in figuring out how the Python API for Spark works,
I've came to the following conclusion and want to share this with the
community; could be of use in the PySpark docs here, specifically the
"Execution and pipelining part".
Any sanity checking would be much appreciated, here
Hi all, I've been experimenting with DataFrame operations in a mixed
endian environment - a big endian master with little endian workers. With
tungsten enabled I'm encountering data corruption issues.
For example, with this simple test code:
import org.apache.spark.SparkContext
import org.apach
with plenty of errors. Must be an
easier way...
From: Josh Rosen
To: Adam Roberts/UK/IBM@IBMGB
Cc: dev
Date: 21/09/2015 19:19
Subject:Re: Test workflow - blacklist entire suites and run any
independently
For quickly running individual suites:
https://cwiki.apache.or
Hi, is there an existing way to blacklist any test suite?
Ideally we'd have a text file with a series of names (let's say comma
separated) and if a name matches with the fully qualified class name for a
suite, this suite will be skipped.
Perhaps we can achieve this via ScalaTest or Maven?
Curr
23 matches
Mail list logo