date:20150923

Get only updated RDDs from or after updateStateBykey

2015-09-23 Thread Bin Wang

I've read the source code and it seems to be impossible, but I'd like to confirm it. It is a very useful feature. For example, I need to store the state of DStream into my database, in order to recovery them from next redeploy. But I only need to save the updated ones. Save all keys into database

Re: Checkpoint directory structure

2015-09-23 Thread Bin Wang

I've attached the full log. The error is like this: 15/09/23 17:47:39 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: requirement failed: Checkpoint directory does not exist: hdfs:// szq2.appadhoc.com:8020/user/root/checkpoint/d3714249-e03a-45c7-a0d5-1

RE: SparkR package path

2015-09-23 Thread Sun, Rui

SparkR package is not a standalone R package, as it is actually R API of Spark and needs to co-operate with a matching version of Spark, so exposing it in CRAN does not ease use of R users as they need to download matching Spark distribution, unless we expose a bundled SparkR package to CRAN (pa

Re: Checkpoint directory structure

2015-09-23 Thread Tathagata Das

Could you provide the logs on when and how you are seeing this error? On Wed, Sep 23, 2015 at 6:32 PM, Bin Wang wrote: > BTW, I just kill the application and restart it. Then the application > cannot recover from checkpoint because of some lost of RDD. So I'm wonder, > if there are some failure

Re: Checkpoint directory structure

2015-09-23 Thread Bin Wang

BTW, I just kill the application and restart it. Then the application cannot recover from checkpoint because of some lost of RDD. So I'm wonder, if there are some failure in the application, won't it possible not be able to recovery from checkpoint? Bin Wang 于2015年9月23日周三下午6:58写道： > I find the c

Re: RFC: packaging Spark without assemblies

2015-09-23 Thread Patrick Wendell

I think it would be a big improvement to get rid of it. It's not how jars are supposed to be packaged and it has caused problems in many different context over the years. For me a key step in moving away would be to fully audit/understand all compatibility implications of removing it. If other peo

RFC: packaging Spark without assemblies

2015-09-23 Thread Marcelo Vanzin

Hey all, This is something that we've discussed several times internally, but never really had much time to look into; but as time passes by, it's increasingly becoming an issue for us and I'd like to throw some ideas around about how to fix it. So, without further ado: https://github.com/vanzin/

Re: SparkR package path

2015-09-23 Thread Hossein

Yes, I think exposing SparkR in CRAN can significantly expand the reach of both SparkR and Spark itself to a larger community of data scientists (and statisticians). I have been getting questions on how to use SparkR in RStudio. Most of these folks have a Spark Cluster and wish to talk to it from

Checkpoint directory structure

2015-09-23 Thread Bin Wang

I find the checkpoint directory structure is like this: -rw-r--r-- 1 root root 134820 2015-09-23 16:55 /user/root/checkpoint/checkpoint-144299850 -rw-r--r-- 1 root root 134768 2015-09-23 17:00 /user/root/checkpoint/checkpoint-144299880 -rw-r--r-- 1 root root 134895 2015-0

using Codahale counters in source

2015-09-23 Thread Steve Loughran

Quick question: is it OK to use Codahale Metric classes (e.g. Counter) in source as generic thread-safe counters, with the option of hooking them to a Codahale metrics registry if there is one in the spark context? The Counter class does extend LongAdder, which is by Doug Lea and promises to

Get only updated RDDs from or after updateStateBykey

Re: Checkpoint directory structure

RE: SparkR package path

Re: Checkpoint directory structure

Re: Checkpoint directory structure

Re: RFC: packaging Spark without assemblies

RFC: packaging Spark without assemblies

Re: SparkR package path

Checkpoint directory structure

using Codahale counters in source

10 matches

Site Navigation

Mail list logo

Footer information