Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Adam Roberts
+1 (non-binding), looks good Tested on RHEL 7.2, 7.3, CentOS 7.2, Ubuntu 14 04 and 16 04, SUSE 12, x86, IBM Linux on Power and IBM Linux on Z (big-endian) No problems with latest IBM Java, Hadoop 2.7.3 and Scala 2.11.8, no performance concerns to report either (spark-sql-perf and HiBench) Buil

Re: Spark performance tests

2017-01-10 Thread Adam Roberts
Hi, I suggest HiBench and SparkSqlPerf, HiBench features many benchmarks within it that exercise several components of Spark (great for stressing core, sql, MLlib capabilities), SparkSqlPerf features 99 TPC-DS queries (stressing the DataFrame API and therefore the Catalyst optimiser), both work

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-18 Thread Adam Roberts
+1 (non-binding) Functional: looks good, tested with OpenJDK 8 (1.8.0_111) and IBM's latest SDK for Java (8 SR3 FP21). Tests run clean on Ubuntu 16 04, 14 04, SUSE 12, CentOS 7.2 on x86 and IBM specific platforms including big-endian. On slower machines I see these failing but nothing to be co

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-13 Thread Adam Roberts
I've never seen the ReplSuite test OoMing with IBM's latest SDK for Java but have always noticed this particular test failing with the following instead: java.lang.AssertionError: assertion failed: deviation too large: 0.8506807397223823, first size: 180392, second size: 333848 This particular

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-11 Thread Adam Roberts
+1 (non-binding) Build: mvn -T 1C -Psparkr -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean package Test: mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Dtest.exclude.tags=org.apache.spark.tags.DockerTest -fn test Test options: -Xss2048k -Dspark.buffer.pageSize=1048576 -Xmx4

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-02 Thread Adam Roberts
I'm seeing the same failure but manifesting itself as a stackoverflow, various operating systems and architectures (RHEL 71, CentOS 72, SUSE 12, Ubuntu 14 04 and 16 04 LTS) Build and test options: mvn -T 1C -Psparkr -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean package mvn -

Re: Spark 2.0.0 performance; potential large Spark core regression

2016-07-11 Thread Adam Roberts
for 1.6.2 vs 2.0 with HiBench (large profile, 25g executor memory, 4g driver), again we will be carefully checking how these benchmarks are being run and what difference the options and configurations can make Cheers, From: Ted Yu To: Adam Roberts/UK/IBM@IBMGB Cc: Michael Al

Re: Spark performance regression test suite

2016-07-11 Thread Adam Roberts
Agreed, this is something that we do regularly when producing our own Spark distributions in IBM and so it will be beneficial to share updates with the wider community, so far it looks like Spark 1.6.2 is the best out of the box on spark-perf and HiBench (of course this may vary for real worklo

Re: Spark 2.0.0 performance; potential large Spark core regression

2016-07-08 Thread Adam Roberts
WholeStageCodegen: on I think, we turned it off when fixing a bug offHeap.enabled: false offHeap.size: 0 Cheers, From: Michael Allman To: Adam Roberts/UK/IBM@IBMGB Cc: dev Date: 08/07/2016 17:05 Subject:Re: Spark 2.0.0 performance; potential large Spark core regression

Re: Spark 2.0.0 performance; potential large Spark core regression

2016-07-08 Thread Adam Roberts
er 8g, executor memory 16g, Kryo, 0.66 memory fraction, 100 trials We can post the 1.6.2 comparison early next week, running lots of iterations over the weekend once we get the dedicated time again Cheers, From: Michael Allman To: Adam Roberts/UK/IBM@IBMGB Cc: dev Date: 08/07

Spark 2.0.0 performance; potential large Spark core regression

2016-07-08 Thread Adam Roberts
Hi, we've been testing the performance of Spark 2.0 compared to previous releases, unfortunately there are no Spark 2.0 compatible versions of HiBench and SparkPerf apart from those I'm working on (see https://github.com/databricks/spark-perf/issues/108) With the Spark 2.0 version of SparkPerf

Re: Understanding pyspark data flow on worker nodes

2016-07-08 Thread Adam Roberts
Hi, sharing what I discovered with PySpark too, corroborates with what Amit notices and also interested in the pipe question: h ttps://mail-archives.apache.org/mod_mbox/spark-dev/201603.mbox/%3c201603291521.u2tflbfo024...@d06av05.portsmouth.uk.ibm.com%3E // Start a thread to feed the process inp

Re: Databricks SparkPerf with Spark 2.0

2016-06-14 Thread Adam Roberts
s file is super important), so the emails here will at least point people there. Cheers, From: Adam Roberts/UK/IBM@IBMGB To: dev Date: 14/06/2016 12:18 Subject:Databricks SparkPerf with Spark 2.0 Hi, I'm working on having "SparkPerf" ( https://github.com/d

Databricks SparkPerf with Spark 2.0

2016-06-14 Thread Adam Roberts
Hi, I'm working on having "SparkPerf" ( https://github.com/databricks/spark-perf) run with Spark 2.0, noticed a few pull requests not yet accepted so concerned this project's been abandoned - it's proven very useful in the past for quality assurance as we can easily exercise lots of Spark functi

Caching behaviour and deserialized size

2016-05-04 Thread Adam Roberts
Hi, Given a very simple test that uses a bigger version of the pom.xml file in our Spark home directory (cat with a bash for loop into itself so it becomes 100 MB), I've noticed with larger heap sizes it looks like we have more RDDs reported as being cached, is this intended behaviour? What e

Re: BytesToBytes and unaligned memory

2016-04-18 Thread Adam Roberts
initely not supported for shorts/ints/longs. if these tests continue to pass then I think the Spark tests don't exercise unaligned memory access, cheers From: Ted Yu To: Adam Roberts/UK/IBM@IBMGB Cc: "dev@spark.apache.org" Date: 15/04/2016 17:35 Subject:

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Adam Roberts
Ted, yeah with the forced true value the tests in that suite all pass and I know they're being executed thanks to prints I've added Cheers, From: Ted Yu To: Adam Roberts/UK/IBM@IBMGB Cc: "dev@spark.apache.org" Date: 15/04/2016 16:43 Subject:

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Adam Roberts
anyway for experimenting", and the tests pass. No other problems on the platform (pending a different pull request). Cheers, From: Ted Yu To: Adam Roberts/UK/IBM@IBMGB Cc: "dev@spark.apache.org" Date: 15/04/2016 15:32 Subject:Re: BytesToBytes and un

BytesToBytes and unaligned memory

2016-04-15 Thread Adam Roberts
Hi, I'm testing Spark 2.0.0 on various architectures and have a question, are we sure if core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java really is attempting to use unaligned memory access (for the BytesToBytesMapOffHeapSuite tests specifically)? Our JDKs on z

Understanding PySpark Internals

2016-03-29 Thread Adam Roberts
Hi, I'm interested in figuring out how the Python API for Spark works, I've came to the following conclusion and want to share this with the community; could be of use in the PySpark docs here, specifically the "Execution and pipelining part". Any sanity checking would be much appreciated, here

Tungsten in a mixed endian environment

2016-01-12 Thread Adam Roberts
Hi all, I've been experimenting with DataFrame operations in a mixed endian environment - a big endian master with little endian workers. With tungsten enabled I'm encountering data corruption issues. For example, with this simple test code: import org.apache.spark.SparkContext import org.apach

Re: Test workflow - blacklist entire suites and run any independently

2015-09-21 Thread Adam Roberts
with plenty of errors. Must be an easier way... From: Josh Rosen To: Adam Roberts/UK/IBM@IBMGB Cc: dev Date: 21/09/2015 19:19 Subject:Re: Test workflow - blacklist entire suites and run any independently For quickly running individual suites: https://cwiki.apache.or

Test workflow - blacklist entire suites and run any independently

2015-09-21 Thread Adam Roberts
Hi, is there an existing way to blacklist any test suite? Ideally we'd have a text file with a series of names (let's say comma separated) and if a name matches with the fully qualified class name for a suite, this suite will be skipped. Perhaps we can achieve this via ScalaTest or Maven? Curr