Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Surendranauth Hiraman
Patrick, My team is using shuffle consolidation but not speculation. We are also using persist(DISK_ONLY) for caching. Here are some config changes that are in our work-in-progress. We've been trying for 2 weeks to get our production flow (maybe around 50-70 stages, a few forks and joins with

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-17 Thread Patrick Wendell
Out of curiosity - are you guys using speculation, shuffle consolidation, or any other non-default option? If so that would help narrow down what's causing this corruption. On Tue, Jun 17, 2014 at 10:40 AM, Surendranauth Hiraman suren.hira...@velos.io wrote: Matt/Ryan, Did you make any headway

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-06 Thread Ryan Compton
Just ran into this today myself. I'm on branch-1.0 using a CDH3 cluster (no modifications to Spark or its dependencies). The error appeared trying to run GraphX's .connectedComponents() on a ~200GB edge list (GraphX worked beautifully on smaller data). Here's the stacktrace (it's quite similar to

Java IO Stream Corrupted - Invalid Type AC?

2014-06-04 Thread Matt Kielo
Hi Im trying run some spark code on a cluster but I keep running into a java.io.StreamCorruptedException: invalid type code: AC error. My task involves analyzing ~50GB of data (some operations involve sorting) then writing them out to a JSON file. Im running the analysis on each of the data's ~10

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-04 Thread Sean Owen
On Wed, Jun 4, 2014 at 3:33 PM, Matt Kielo mki...@oculusinfo.com wrote: Im trying run some spark code on a cluster but I keep running into a java.io.StreamCorruptedException: invalid type code: AC error. My task involves analyzing ~50GB of data (some operations involve sorting) then writing