Ankur - To answer your specific question re: Q: Is a s3 path considered non-hdfs? A: At this time no, it uses the hdfs layer to resolve (for better or worse).
--------------------------------------------------------------------- // Grab the resource using the hadoop client if it's one of the known schemes // TODO(tarnfeld): This isn't very scalable with hadoop's pluggable // filesystem implementations. // TODO(matei): Enforce some size limits on files we get from HDFS if (strings::startsWith(uri, "hdfs://") || strings::startsWith(uri, "hftp://") || strings::startsWith(uri, "s3://") || strings::startsWith(uri, "s3n://")) { Try<string> base = os::basename(uri); if (base.isError()) { LOG(ERROR) << "Invalid basename for URI: " << base.error(); return Error("Invalid basename for URI"); } string path = path::join(directory, base.get()); HDFS hdfs; LOG(INFO) << "Downloading resource from '" << uri << "' to '" << path << "'"; Try<Nothing> result = hdfs.copyToLocal(uri, path); if (result.isError()) { LOG(ERROR) << "HDFS copyToLocal failed: " << result.error(); return Error(result.error()); } --------------------------------------------------------------------- ----- Original Message ----- > From: "Ankur Chauhan" <an...@malloc64.com> > To: user@mesos.apache.org > Sent: Tuesday, October 21, 2014 10:28:50 AM > Subject: Re: Do i really need HDFS? > This is what I also intend to do. Is a s3 path considered non-hdfs? If so, > how does it know the credentials to use to fetch the file. > Sent from my iPhone > On Oct 21, 2014, at 5:16 AM, David Greenberg < dsg123456...@gmail.com > > wrote: > > We use spark without HDFS--in our case, we just use ansible to copy the > > spark > > executors onto all hosts at the same path. We also load and store our spark > > data from non-HDFS sources. > > > On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies < d...@hellooperator.net > > > wrote: > > > > I think Spark needs a way to send jobs to/from the workers - the Spark > > > > > > distro itself > > > > > > will pull down the executor ok, but in my (very basic) tests I got > > > > > > stuck without HDFS. > > > > > > So basically it depends on the framework. I think in Sparks case they > > > > > > assume most > > > > > > users are migrating from an existing Hadoop deployment, so HDFS is > > > > > > sort of assumed. > > > > > > On 20 October 2014 23:18, CCAAT < cc...@tampabay.rr.com > wrote: > > > > > > > On 10/20/14 11:46, Steven Schlansker wrote: > > > > > > > > > > > > > > > > > > > > >> We are running Mesos entirely without HDFS with no problems. We use > > > > > > >> Docker to distribute our > > > > > > >> application to slave nodes, and keep no state on individual nodes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Background: I'm building up a 3 node cluster to run mesos and spark. No > > > > > > > legacy Hadoop needed or wanted. I am using btrfs for the local file > > > > system, > > > > > > > with (2) drives set up for raid1 on each system. > > > > > > > > > > > > > > So you are suggesting that I can install mesos + spark + docker > > > > > > > and not a DFS on these (3) machines? > > > > > > > > > > > > > > > > > > > > > Will I need any other softwares? My application is a geophysical > > > > > > > fluid simulator, so scala, R, and all sorts of advanced math will > > > > > > > be required on the cluster for the Finite Element Methods. > > > > > > > > > > > > > > > > > > > > > James > > > > > > > > > > > > > > > > > -- -- Cheers, Timothy St. Clair Red Hat Inc.