Patrick,
My team is using shuffle consolidation but not speculation. We are also
using persist(DISK_ONLY) for caching.
Here are some config changes that are in our work-in-progress.
We've been trying for 2 weeks to get our production flow (maybe around
50-70 stages, a few forks and joins with
Out of curiosity - are you guys using speculation, shuffle
consolidation, or any other non-default option? If so that would help
narrow down what's causing this corruption.
On Tue, Jun 17, 2014 at 10:40 AM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
Matt/Ryan,
Did you make any headway
Just ran into this today myself. I'm on branch-1.0 using a CDH3
cluster (no modifications to Spark or its dependencies). The error
appeared trying to run GraphX's .connectedComponents() on a ~200GB
edge list (GraphX worked beautifully on smaller data).
Here's the stacktrace (it's quite similar to
Hi
Im trying run some spark code on a cluster but I keep running into a
java.io.StreamCorruptedException: invalid type code: AC error. My task
involves analyzing ~50GB of data (some operations involve sorting) then
writing them out to a JSON file. Im running the analysis on each of the
data's ~10
On Wed, Jun 4, 2014 at 3:33 PM, Matt Kielo mki...@oculusinfo.com wrote:
Im trying run some spark code on a cluster but I keep running into a
java.io.StreamCorruptedException: invalid type code: AC error. My task
involves analyzing ~50GB of data (some operations involve sorting) then
writing