Re: RDDs

2015-03-03 Thread Kartheek.R
Hi TD, "You can always run two jobs on the same cached RDD, and they can run in parallel (assuming you launch the 2 jobs from two different threads)" Is this a correct way to launch jobs from two different threads? val threadA = new Thread(new Runnable { def run() { for(i<- 0 until e

Re: RDDs

2015-03-03 Thread Manas Kar
The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 7:00 AM, Kartheek.R wrote: > Hi TD, > "You can always

Re: RDDs

2015-03-03 Thread Manas Kar
The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 9:16 AM, Manas Kar wrote: > The above is a great examp

Re: RDDs

2014-09-03 Thread Tobias Pfeiffer
Hello, On Wed, Sep 3, 2014 at 6:02 PM, rapelly kartheek wrote: > > Can someone tell me what kind of operations can be performed on a > replicated rdd?? What are the use-cases of a replicated rdd. > I suggest you read https://spark.apache.org/docs/latest/programming-guide.html#resilient-distrib

RE: RDDs

2014-09-03 Thread Liu, Raymond
Not sure what did you refer to when saying replicated rdd, if you actually mean RDD, then, yes , read the API doc and paper as Tobias mentioned. If you actually focus on the word "replicated", then that is for fault tolerant, and probably mostly used in the streaming case for receiver created RD

RE: RDDs

2014-09-03 Thread Kartheek.R
Thank you Raymond and Tobias. Yeah, I am very clear about what I was asking. I was talking about "replicated" rdd only. Now that I've got my understanding about job and application validated, I wanted to know if we can replicate an rdd and run two jobs (that need same rdd) of an application in par

RE: RDDs

2014-09-03 Thread Liu, Raymond
bject: RE: RDDs Thank you Raymond and Tobias. Yeah, I am very clear about what I was asking. I was talking about "replicated" rdd only. Now that I've got my understanding about job and application validated, I wanted to know if we can replicate an rdd and run two jobs (that

Re: RDDs

2014-09-03 Thread Tathagata Das
theek.R [mailto:kartheek.m...@gmail.com] > Sent: Thursday, September 04, 2014 1:24 PM > To: u...@spark.incubator.apache.org > Subject: RE: RDDs > > Thank you Raymond and Tobias. > Yeah, I am very clear about what I was asking. I was talking about > "replicated" rdd only. Now

Re: RDDs

2014-09-04 Thread Kartheek.R
Thank you yuanbosoft. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13444.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-

Re: RDDs and Immutability

2014-09-13 Thread Nicholas Chammas
Have you tried using RDD.map() to transform some of the RDD elements from 0 to 1? Why doesn’t that work? That’s how you change data in Spark, by defining a new RDD that’s a transformation of an old one. ​ On Sat, Sep 13, 2014 at 5:39 AM, Deep Pradhan wrote: > Hi, > We all know that RDDs are immu

Re: RDDs join problem: incorrect result

2015-07-28 Thread ponkin
Hi, Alice Did you find solution? I have exactly the same problem. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-join-problem-incorrect-result-tp19928p24049.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: RDDs join problem: incorrect result

2015-07-28 Thread ๏̯͡๏
What is the size of each RDD? Size of your cluster & spark configurations that you tried out. On Tue, Jul 28, 2015 at 9:54 PM, ponkin wrote: > Hi, Alice > > Did you find solution? > I have exactly the same problem. > > > > -- > View this message in context: > http://apache-spark-user-list.100156

Re: RDDs being cleaned too fast

2014-12-16 Thread Harihar Nahak
RDD.persist() can be useful here. On 11 December 2014 at 14:34, ankits [via Apache Spark User List] < ml-node+s1001560n20613...@n3.nabble.com> wrote: > > I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too > fast. How can i inspect the size of RDD in memory and get more informa

Re: RDDs join problem: incorrect result

2014-11-30 Thread Harihar Nahak
what do you mean by incorrect? could you please share some examples from both the RDD and resultant RDD also If you get any exception paste that too. it helps to debug where is the issue On 27 November 2014 at 17:07, liuboya [via Apache Spark User List] < ml-node+s1001560n19928...@n3.nabble.com> w

Re: RDDs being cleaned too fast

2014-12-10 Thread Aaron Davidson
The ContextCleaner uncaches RDDs that have gone out of scope on the driver. So it's possible that the given RDD is no longer reachable in your program's control flow, or else it'd be a bug in the ContextCleaner. On Wed, Dec 10, 2014 at 5:34 PM, ankits wrote: > I'm using spark 1.1.0 and am seeing

Re: RDDs being cleaned too fast

2014-12-11 Thread Ranga
I was having similar issues with my persistent RDDs. After some digging around, I noticed that the partitions were not balanced evenly across the available nodes. After a "repartition", the RDD was spread evenly across all available memory. Not sure if that is something that would help your use-cas

Re: RDDs caching in typical machine learning use cases

2016-04-04 Thread Eugene Morozov
Hi, Yes, I believe people do that. I also believe that SparkML is able to figure out when to cache some internal RDD also. That's definitely true for random forest algo. It doesn't harm to cache the same RDD twice, too. But it's not clear what'd you want to know... -- Be well! Jean Morozov On S

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-04 Thread bryan . jeffrey
Rdd operation: rdd.map(x => (word, count)).reduceByKey(_+_) Get Outlook for Android On Sat, Mar 4, 2017 at 8:59 AM -0500, "Old-School" wrote: Hi, I want to perform some simple transformations and check the execution time, under various configurations (e.g. number of

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-05 Thread khwunchai jaengsawang
Hi Old-Scool, For the first question, you can specify the number of partition in any DataFrame by using repartition(numPartitions: Int, partitionExprs: Column*). Example: val partitioned = data.repartition(numPartitions=10).cache() For your second question, you can transform your RDD in

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-05 Thread ayan guha
Just as best practice, dataframe and datasets are preferred way, so try not to resort to rdd unless you absolutely have to... On Sun, 5 Mar 2017 at 7:10 pm, khwunchai jaengsawang wrote: > Hi Old-Scool, > > > For the first question, you can specify the number of partition in any > DataFrame by us