hi,
I wanna find the storage locations( BlockManagerIds) of each partition when
the rdd is replicated twice. I mean, If a twice replicated rdd has got 5
partitions, I would like to know the first and second storage locations of
each partition. Basically, I am trying to modify the list of nodes sel
-- Forwarded message --
From: rapelly kartheek
Date: Thu, Nov 27, 2014 at 11:47 AM
Subject: How the sequence of blockManagerId's are constructed in
spark/*/storage/blockManagerMasterActor.getPeers()?
To: u...@spark.apache.org
Hi,
I've been fiddling with spark
Hi,
I am trying to understand how the
/spark/*/Storage/BlockManagerMaster.askDriverWithReply() works.
def getPeers(blockManagerId: BlockManagerId, numPeers: Int):
Seq[BlockManagerId] = {
val result =
askDriverWithReply[Seq[BlockManagerId]](GetPeers(blockManagerId, numPeers))
if (result.length
Hi,
I am trying to find out where exactly in the spark code are the resources
getting allocated for a newly submitted spark application.
I have a stand-alone spark cluster. Can someone please direct me to the
right part of the code.
regards
one BlockManagerId in
each node.
Can someone please help me where to look for obtaining all the nodes chosen
for replication.
Thank you
On Tue, Sep 16, 2014 at 12:04 PM, rapelly kartheek
wrote:
> HI,
>
> I was tracing the flow of replicate method in BlockManager.scala. I am
> trying
HI,
I was tracing the flow of replicate method in BlockManager.scala. I am
trying to find out as to where exactly in the code, the resources are
aquired for rdd replication.
I find that the BlockManagerMaster.getPeers() method returns only one
BlockManagerId for all the rdd partitions.
But, the
Hi,
I want to incorporate some intelligence while choosing the resources for
rdd replication. I thought, if we replicate rdd on specially chosen nodes
based on the capabilities, the next application that requires this rdd can
be executed more efficiently. But, I found that an rdd creatd by an
appp
Hi,
I've exercised multiple options available for persist() including RDD
replication. I have gone thru the classes that involve in caching/storing
the RDDS at different levels. StorageLevel class plays a pivotal role by
recording whether to use memory or disk or to replicate the RDD on multiple
Hi,
I am trying to understand how the resource allocation happens in spark. I
understand the resourceOffer method in taskScheduler. This method takes
care of locality factor while allocating the resources. This resourceOffer
method gets invoked by the corresponding cluster manager.
I am working o