I haven't got as far as deploying a FS yet - still weighing up the options.
Our Mesos cluster is just a PaaS at the moment but I think the option to use capacity for adhoc distributed computing alongside the web workloads is a killer feature. We're soon to Dockerize as well so some option that can be reached from containers is pretty important too. Ceph is a strong candidate because of the S3 compatibility, since I know that will be usable from within Docker without any trouble when we need non-DB persistence. That and it's resilience seems a good match to Mesos' own. I need some real world war story type research before I can really say it's a good alternative though. As I'm a Spark newbie I don't want to run before I can walk, so I'll probably start with a HDFS deployment on the test systems to get the feel of it first. On 22 October 2014 17:40, CCAAT <cc...@tampabay.rr.com> wrote: > Ok so, > > I'd be curious to know your final architecture (D. Davies)? > > I was looking to put Ceph on top of the (3) btrfs nodes in case we need a > DFS at some later point. We're not really sure what softwares will be > in our final mix. Certainly installing Ceph does not hurt anything (?); > and I'm not sure we want to use ceph from userspace only. We have had > excellent success using btrfs, so that is firm for us, short of some > gapping problem emerging. Growing the cluster size will happen, once > we establish the basic functionality of the cluster. > > Right now, there is a focus on subsurface fluid simulations for carbon > sequsttration, but also using the cluster for general (cron-chronos) batch > jobs is a secondary appeal to us. So, I guess my question is, knowing that > we want to avoid the hdfs/hadoop setup entirely, will localFS/DFS with > btrfs/ceph be sufficiently robust to test not only mesos+spark but many > other related softwares, such as but not limited to R, scala, sparkR, > database(sql) and many other softwares? We're just trying to avoid some > common mistakes as we move forward with mesos. > > James > > > > > On 10/22/14 02:29, Dick Davies wrote: >> >> Be interested to know what that is, if you don't mind sharing. >> >> We're thinking of deploying a Ceph cluster for another project anyway, >> it seems to remove some of the chokepoints/points of failure HDFS suffers >> from >> but I've no idea how well it can interoperate with the usual HDFS clients >> (Spark in my particular case but I'm trying to keep this general). >> >> On 21 October 2014 13:16, David Greenberg <dsg123456...@gmail.com> wrote: >>> >>> We use spark without HDFS--in our case, we just use ansible to copy the >>> spark executors onto all hosts at the same path. We also load and store >>> our >>> spark data from non-HDFS sources. >>> >>> On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies <d...@hellooperator.net> >>> wrote: >>>> >>>> >>>> I think Spark needs a way to send jobs to/from the workers - the Spark >>>> distro itself >>>> will pull down the executor ok, but in my (very basic) tests I got >>>> stuck without HDFS. >>>> >>>> So basically it depends on the framework. I think in Sparks case they >>>> assume most >>>> users are migrating from an existing Hadoop deployment, so HDFS is >>>> sort of assumed. >>>> >>>> >>>> On 20 October 2014 23:18, CCAAT <cc...@tampabay.rr.com> wrote: >>>>> >>>>> On 10/20/14 11:46, Steven Schlansker wrote: >>>>> >>>>> >>>>>> We are running Mesos entirely without HDFS with no problems. We use >>>>>> Docker to distribute our >>>>>> application to slave nodes, and keep no state on individual nodes. >>>>> >>>>> >>>>> >>>>> >>>>> Background: I'm building up a 3 node cluster to run mesos and spark. No >>>>> legacy Hadoop needed or wanted. I am using btrfs for the local file >>>>> system, >>>>> with (2) drives set up for raid1 on each system. >>>>> >>>>> So you are suggesting that I can install mesos + spark + docker >>>>> and not a DFS on these (3) machines? >>>>> >>>>> >>>>> Will I need any other softwares? My application is a geophysical >>>>> fluid simulator, so scala, R, and all sorts of advanced math will >>>>> be required on the cluster for the Finite Element Methods. >>>>> >>>>> >>>>> James >>>>> >>>>> >>> >>> >> >