Re: Reading from a centralized stored

Franc Carter Tue, 06 Jan 2015 11:45:04 -0800

One more question, to be clarify. Will every node pull in all the data ?

thanks


On Tue, Jan 6, 2015 at 12:56 PM, Cody Koeninger <[email protected]> wrote:

> If you are not co-locating spark executor processes on the same machines
> where the data is stored, and using an rdd that knows about which node to
> prefer scheduling a task on, yes, the data will be pulled over the network.
>
> Of the options you listed, S3 and DynamoDB cannot have spark running on
> the same machines. Cassandra can be run on the same nodes as spark, and
> recent versions of the spark cassandra connector implement preferred
> locations.  You can run an rdbms on the same nodes as spark, but JdbcRDD
> doesn't implement preferred locations.
>
> On Mon, Jan 5, 2015 at 6:25 PM, Franc Carter <[email protected]
> > wrote:
>
>>
>> Hi,
>>
>> I'm trying to understand how a Spark Cluster behaves when the data it is
>> processing resides on a centralized/remote store (S3, Cassandra, DynamoDB,
>> RDBMS etc).
>>
>> Does every node in the cluster retrieve all the data from the central
>> store ?
>>
>> thanks
>>
>> --
>>
>> *Franc Carter* | Systems Architect | Rozetta Technology
>>
>> [email protected]  <[email protected]>|
>> www.rozettatechnology.com
>>
>> Tel: +61 2 8355 2515
>>
>> Level 4, 55 Harrington St, The Rocks NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>> AUSTRALIA
>>
>>
>


-- 

*Franc Carter* | Systems Architect | Rozetta Technology

[email protected]  <[email protected]>|
www.rozettatechnology.com

Tel: +61 2 8355 2515

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

AUSTRALIA

Re: Reading from a centralized stored

Reply via email to