I've filed a ticket for this issue here:
https://issues.apache.org/jira/browse/SPARK-5209. (This reproduces the
problem on a smaller cluster size.)
-Sven
On Wed, Jan 7, 2015 at 11:13 AM, Sven Krasser wrote:
> Could you try it on AWS using EMR? That'd give you an exact replica of the
> environmen
Could you try it on AWS using EMR? That'd give you an exact replica of the
environment that causes the error.
Sent from my iPhone
> On Jan 7, 2015, at 10:53 AM, Davies Liu wrote:
>
> Hey Sven,
>
> I tried with all of your configurations, 2 node with 2 executors each,
> but in standalone mode
Hey Sven,
I tried with all of your configurations, 2 node with 2 executors each,
but in standalone mode,
it worked fine.
Could you try to narrow down the possible change of configurations?
Davies
On Tue, Jan 6, 2015 at 8:03 PM, Sven Krasser wrote:
> Hey Davies,
>
> Here are some more details o
Hey Davies,
Here are some more details on a configuration that causes this error for
me. Launch an AWS Spark EMR cluster as follows:
*aws emr create-cluster --region us-west-1 --no-auto-terminate \
--ec2-attributes KeyName=your-key-here,SubnetId=your-subnet-here \
--bootstrap-actions
Path=
I still can not reproduce it with 2 nodes (4 CPUs).
Your repro.py could be faster (10 min) than before (22 min):
inpdata.map(lambda (pc, x): (x, pc=='p' and 2 or
1)).reduceByKey(lambda x, y: x|y).filter(lambda (x, pc):
pc==3).collect()
(also, no cache needed anymore)
Davies
On Tue, Jan 6, 20
The issue has been sensitive to the number of executors and input data
size. I'm using 2 executors with 4 cores each, 25GB of memory, 3800MB of
memory overhead for YARN. This will fit onto Amazon r3 instance types.
-Sven
On Tue, Jan 6, 2015 at 12:46 AM, Davies Liu wrote:
> I had ran your scripts
I had ran your scripts in 5 nodes ( 2 CPUs, 8G mem) cluster, can not
reproduce your failure. Should I test it with big memory node?
On Mon, Jan 5, 2015 at 4:00 PM, Sven Krasser wrote:
> Thanks for the input! I've managed to come up with a repro of the error with
> test data only (and without any
Thanks for the input! I've managed to come up with a repro of the error
with test data only (and without any of the custom code in the original
script), please see here:
https://gist.github.com/skrasser/4bd7b41550988c8f6071#file-gistfile1-md
The Gist contains a data generator and the script reprod
It doesn’t seem like there’s a whole lot of clues to go on here without seeing
the job code. The original "org.apache.spark.SparkException: PairwiseRDD:
unexpected value: List([B@130dc7ad)” error suggests that maybe there’s an issue
with PySpark’s serialization / tracking of types, but it’s har
Hey Josh,
I am still trying to prune this to a minimal example, but it has been
tricky since scale seems to be a factor. The job runs over ~720GB of data
(the cluster's total RAM is around ~900GB, split across 32 executors). I've
managed to run it over a vastly smaller data set without issues. Cur
Hi Sven,
Do you have a small example program that you can share which will allow me
to reproduce this issue? If you have a workload that runs into this, you
should be able to keep iteratively simplifying the job and reducing the
data set size until you hit a fairly minimal reproduction (assuming
Hey all,
Since upgrading to 1.2.0 a pyspark job that worked fine in 1.1.1 fails
during shuffle. I've tried reverting from the sort-based shuffle back to
the hash one, and that fails as well. Does anyone see similar problems or
has an idea on where to look next?
For the sort-based shuffle I get a
12 matches
Mail list logo