Hi Sun,

The issue with Ceph as the underlying file system for Spark is that you
lose data locality. Ceph is not designed to have spark run directly on top
of the OSDs. I know that cephfs provides data location information via
hadoop compatible API. The last time I researched on this is that the
integration is experimental (just google it and you will find a lot of
discussions eg.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/002837.html).

However, this might not be a biggest issue as long as you have GREAT
network bandwidth like infiniband or +10 Gigabit Ethernet. My guess is that
the architecture and the performance will be similar to S3+Spark at best
(with 10GE instances) if you guys do the network stuff seriously.

HTH,

Jerry

On Tue, Sep 22, 2015 at 9:59 PM, fightf...@163.com <fightf...@163.com>
wrote:

> Hi Jerry
>
> Yeah, we managed to run and use ceph already in our few production
> environment, especially with OpenStack.
>
> The reason we want to use Ceph is that we aim to look for some workarounds
> for unified storage layer and the design
>
> concepts of ceph is quite catching. I am just interested in such work like
> the hadoop cephfs plugin and recently we
>
> are going to do some benchmark tests between HDFS and cephfs.
>
> So the ongoing progress would be benificial if some related work between
> Apache Spark and Ceph could dedicate some
>
> thoughful insights.
>
> BTW, for the Ceph Object Gateway s3 rest api, agreed for such
> inconvinience and some incompobilities. However, we had not
>
> yet quite researched and tested over radosgw a lot. But we had some little
> requirements using gw in some use cases.
>
> Hope for more considerations and talks.
>
> Best,
> Sun.
>
> ------------------------------
> fightf...@163.com
>
>
> *From:* Jerry Lam <chiling...@gmail.com>
> *Date:* 2015-09-23 09:37
> *To:* fightf...@163.com
> *CC:* user <user@spark.apache.org>
> *Subject:* Re: Spark standalone/Mesos on top of Ceph
> Do you have specific reasons to use Ceph? I used Ceph before, I'm not too
> in love with it especially when I was using the Ceph Object Gateway S3 API.
> There are some incompatibilities with aws s3 api. You really really need to
> try it because making the commitment. Did you managed to install it?
>
> On Tue, Sep 22, 2015 at 9:28 PM, fightf...@163.com <fightf...@163.com>
> wrote:
>
>> Hi guys,
>>
>> Here is the info for Ceph : http://ceph.com/
>>
>> We are investigating and using Ceph for distributed storage and
>> monitoring, specifically interested
>>
>> in using Ceph as the underlied file system storage for spark. However, we
>> had no experience for achiveing
>>
>> that. Any body has seen such progress ?
>>
>> Best,
>> Sun.
>>
>> ------------------------------
>> fightf...@163.com
>>
>
>
>
>
>

Reply via email to