Re: Combining Many RDDs

2015-03-26 Thread Yang Chen
e you looking for SparkContext.union() [1] ? > > This is not performing well with spark cassandra connector. I am not > sure whether this will help you. > > Thanks and Regards > Noorul > > [1] > http://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.SparkContext > -- Yang Chen Dept. of CISE, University of Florida Mail: y...@yang-cs.com Web: www.cise.ufl.edu/~yang

Re: Combining Many RDDs

2015-03-26 Thread Yang Chen
Hi Mark, That's true, but in neither way can I combine the RDDs, so I have to avoid unions. Thanks, Yang On Thu, Mar 26, 2015 at 5:31 PM, Mark Hamstra wrote: > RDD#union is not the same thing as SparkContext#union > > On Thu, Mar 26, 2015 at 2:27 PM, Yang Chen wrote:

Re: Combining Many RDDs

2015-03-27 Thread Yang Chen
ection instead of RDD. > > val result = sc.parallelize(data)// Create and partition the > 0.5M items in a single RDD. > .flatMap(compute(_)) // You still have only one RDD with each item > joined with external data already > > Hope this help. > > Kelvin &

Fwd: the life cycle shuffle Dependency

2023-12-27 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of a