Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-22 Thread Jean-Pascal Billaud
> > On Tue, Apr 21, 2015 at 4:23 PM, Jean-Pascal Billaud > wrote: > >> Sure. But in general, I am assuming this ""Graph is unexpectedly null >> when DStream is being serialized" must mean something. Under which >> circumstances, such an exception woul

Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-21 Thread Jean-Pascal Billaud
n. The only way to figure to take a > look at the disassembled bytecodes using javap. > > TD > > On Tue, Apr 21, 2015 at 1:53 PM, Jean-Pascal Billaud > wrote: > >> At this point I am assuming that nobody has an idea... I am still going >> to give it a

Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-21 Thread Jean-Pascal Billaud
At this point I am assuming that nobody has an idea... I am still going to give it a last shot just in case it was missed by some people :) Thanks, On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud wrote: > Hey, so I start the context at the very end when all the piping is done. >

Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-20 Thread Jean-Pascal Billaud
:33 PM, Tathagata Das wrote: > When are you getting this exception? After starting the context? > > TD > > On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud > wrote: > >> Hi, >> >> I am getting this serialization exception and I am not too sure what >

Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-20 Thread Jean-Pascal Billaud
Hi, I am getting this serialization exception and I am not too sure what "Graph is unexpectedly null when DStream is being serialized" means? 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Task not serializable) Exceptio

Re: Spark streaming and executor object reusage

2015-03-07 Thread Jean-Pascal Billaud
Thanks a lot. Sent from my iPad > On Mar 7, 2015, at 8:26 AM, Sean Owen wrote: > >> On Sat, Mar 7, 2015 at 4:17 PM, Jean-Pascal Billaud >> wrote: >> So given this let's go a bit further. Imagine my static factory provides a >> stats collector that my var

Re: Spark streaming and executor object reusage

2015-03-07 Thread Jean-Pascal Billaud
per app though so would not be shared with another > streaming job, no. Given what you said earlier that totally makes sense. In general is there any spark architecture documentation other than the code that gives a good overview of the thing we talked about? Thanks again for your help, >

Spark streaming and executor object reusage

2015-03-06 Thread Jean-Pascal Billaud
Hi, Reading through the Spark Streaming Programming Guide, I read in the "Design Patterns for using foreachRDD": "Finally, this can be further optimized by reusing connection objects across multiple RDDs/batches. One can maintain a static pool of connection objects than can be reused as RDDs of m

Re: Spark and Spark Streaming code sharing best practice.

2015-02-18 Thread Jean-Pascal Billaud
A list of such practices could be really useful. > > On Thu, Feb 19, 2015 at 12:26 AM, Jean-Pascal Billaud > wrote: > >> Hey, >> >> It seems pretty clear that one of the strength of Spark is to be able to >> share your code between your batch and streamin

Spark and Spark Streaming code sharing best practice.

2015-02-18 Thread Jean-Pascal Billaud
Hey, It seems pretty clear that one of the strength of Spark is to be able to share your code between your batch and streaming layer. Though, given that Spark streaming uses DStream being a set of RDDs and Spark uses a single RDD there might some complexity associated with it. Of course since DSt

Re: DStream demultiplexer based on a key

2014-12-14 Thread Jean-Pascal Billaud
ter(elem=> key(elem) == key).saveAsObjectFile(...) > } > rdd.unpersist() > } > > -kr, Gerard. > > > > > On Sun, Dec 14, 2014 at 7:50 PM, Jean-Pascal Billaud > wrote: >> >> Hey, >> >> I am doing an experiment with Spark Streaming consisting of

DStream demultiplexer based on a key

2014-12-14 Thread Jean-Pascal Billaud
Hey, I am doing an experiment with Spark Streaming consisting of moving data from Kafka to S3 locations while partitioning by date. I have already looked into Linked Camus and Pinterest Secor and while both are workable solutions, it just feels that Spark Streaming should be able to be on par with

Re: SparkSQL + Hive Cached Table Exception

2014-11-01 Thread Jean-Pascal Billaud
sed to collect column statistics, which causes this > issue. Filed SPARK-4182 to track this issue, will fix this ASAP. > > Cheng > >> On Fri, Oct 31, 2014 at 7:04 AM, Jean-Pascal Billaud >> wrote: >> Hi, >> >> While t

SparkSQL + Hive Cached Table Exception

2014-10-30 Thread Jean-Pascal Billaud
Hi, While testing SparkSQL on top of our Hive metastore, I am getting some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD table. Basically, I have a table "mtable" partitioned by some "date" field in hive and below is the scala code I am running in spark-shell: val sqlContex