Find the two storage Locations of each partition of a replicated rdd.

2015-01-23 Thread Rapelly Kartheek
hi, I wanna find the storage locations( BlockManagerIds) of each partition when the rdd is replicated twice. I mean, If a twice replicated rdd has got 5 partitions, I would like to know the first and second storage locations of each partition. Basically, I am trying to modify the list of nodes sel

Fwd: How the sequence of blockManagerId's are constructed in spark/*/storage/blockManagerMasterActor.getPeers()?

2014-11-26 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek Date: Thu, Nov 27, 2014 at 11:47 AM Subject: How the sequence of blockManagerId's are constructed in spark/*/storage/blockManagerMasterActor.getPeers()? To: u...@spark.apache.org Hi, I've been fiddling with spark

How spark/*/Storage/BlockManagerMaster.askDriverWithReply() responds to various query messages

2014-11-07 Thread rapelly kartheek
Hi, I am trying to understand how the /spark/*/Storage/BlockManagerMaster.askDriverWithReply() works. def getPeers(blockManagerId: BlockManagerId, numPeers: Int): Seq[BlockManagerId] = { val result = askDriverWithReply[Seq[BlockManagerId]](GetPeers(blockManagerId, numPeers)) if (result.length

resources allocated for an application

2014-09-23 Thread rapelly kartheek
Hi, I am trying to find out where exactly in the spark code are the resources getting allocated for a newly submitted spark application. I have a stand-alone spark cluster. Can someone please direct me to the right part of the code. regards

Re: how does replicate() method in BlockManager.scala aquires resources for rdd replication

2014-09-16 Thread rapelly kartheek
one BlockManagerId in each node. Can someone please help me where to look for obtaining all the nodes chosen for replication. Thank you On Tue, Sep 16, 2014 at 12:04 PM, rapelly kartheek wrote: > HI, > > I was tracing the flow of replicate method in BlockManager.scala. I am > trying

how does replicate() method in BlockManager.scala aquires resources for rdd replication

2014-09-15 Thread rapelly kartheek
HI, I was tracing the flow of replicate method in BlockManager.scala. I am trying to find out as to where exactly in the code, the resources are aquired for rdd replication. I find that the BlockManagerMaster.getPeers() method returns only one BlockManagerId for all the rdd partitions. But, the

Resource allocation

2014-09-02 Thread rapelly kartheek
Hi, I want to incorporate some intelligence while choosing the resources for rdd replication. I thought, if we replicate rdd on specially chosen nodes based on the capabilities, the next application that requires this rdd can be executed more efficiently. But, I found that an rdd creatd by an appp

RDD replication in Spark

2014-08-25 Thread rapelly kartheek
Hi, I've exercised multiple options available for persist() including RDD replication. I have gone thru the classes that involve in caching/storing the RDDS at different levels. StorageLevel class plays a pivotal role by recording whether to use memory or disk or to replicate the RDD on multiple

Resource allocations

2014-07-16 Thread rapelly kartheek
Hi, I am trying to understand how the resource allocation happens in spark. I understand the resourceOffer method in taskScheduler. This method takes care of locality factor while allocating the resources. This resourceOffer method gets invoked by the corresponding cluster manager. I am working o