Re: Do i really need HDFS?

Ankur Chauhan Tue, 21 Oct 2014 09:19:06 -0700

So that means even if I don't use the dfs I would need HDFS namenode and data 
node and related config to fetch s3 and s3n urns.


Sent from my iPhone

> On Oct 21, 2014, at 8:40 AM, Tim St Clair <tstcl...@redhat.com> wrote:
> 
> Ankur - 
> 
> To answer your specific question re: 
> Q: Is a s3 path considered non-hdfs? 
> A: At this time no, it uses the hdfs layer to resolve (for better or worse).  
>  
> 
> ---------------------------------------------------------------------
>  // Grab the resource using the hadoop client if it's one of the known schemes
>  // TODO(tarnfeld): This isn't very scalable with hadoop's pluggable
>  // filesystem implementations.
>  // TODO(matei): Enforce some size limits on files we get from HDFS
>  if (strings::startsWith(uri, "hdfs://") ||
>      strings::startsWith(uri, "hftp://";) ||
>      strings::startsWith(uri, "s3://") ||
>      strings::startsWith(uri, "s3n://")) {
>    Try<string> base = os::basename(uri);
>    if (base.isError()) {
>      LOG(ERROR) << "Invalid basename for URI: " << base.error();
>      return Error("Invalid basename for URI");
>    }
>    string path = path::join(directory, base.get());
> 
>    HDFS hdfs;
> 
>    LOG(INFO) << "Downloading resource from '" << uri
>              << "' to '" << path << "'";
>    Try<Nothing> result = hdfs.copyToLocal(uri, path);
>    if (result.isError()) {
>      LOG(ERROR) << "HDFS copyToLocal failed: " << result.error();
>      return Error(result.error());
>    }
> ---------------------------------------------------------------------
> 
> ----- Original Message ----- 
> 
>> From: "Ankur Chauhan" <an...@malloc64.com>
>> To: user@mesos.apache.org
>> Sent: Tuesday, October 21, 2014 10:28:50 AM
>> Subject: Re: Do i really need HDFS?
> 
>> This is what I also intend to do. Is a s3 path considered non-hdfs? If so,
>> how does it know the credentials to use to fetch the file.
> 
>> Sent from my iPhone
> 
>> On Oct 21, 2014, at 5:16 AM, David Greenberg < dsg123456...@gmail.com >
>> wrote:
> 
>>> We use spark without HDFS--in our case, we just use ansible to copy the
>>> spark
>>> executors onto all hosts at the same path. We also load and store our spark
>>> data from non-HDFS sources.
> 
>>> On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies < d...@hellooperator.net >
>>> wrote:
> 
>>>> I think Spark needs a way to send jobs to/from the workers - the Spark
>> 
>>>> distro itself
>> 
>>>> will pull down the executor ok, but in my (very basic) tests I got
>> 
>>>> stuck without HDFS.
> 
>>>> So basically it depends on the framework. I think in Sparks case they
>> 
>>>> assume most
>> 
>>>> users are migrating from an existing Hadoop deployment, so HDFS is
>> 
>>>> sort of assumed.
> 
>>>> On 20 October 2014 23:18, CCAAT < cc...@tampabay.rr.com > wrote:
>> 
>>>>> On 10/20/14 11:46, Steven Schlansker wrote:
>> 
>> 
>> 
>>>>>> We are running Mesos entirely without HDFS with no problems. We use
>> 
>>>>>> Docker to distribute our
>> 
>>>>>> application to slave nodes, and keep no state on individual nodes.
>> 
>> 
>> 
>> 
>>>>> Background: I'm building up a 3 node cluster to run mesos and spark. No
>> 
>>>>> legacy Hadoop needed or wanted. I am using btrfs for the local file
>>>>> system,
>> 
>>>>> with (2) drives set up for raid1 on each system.
>> 
>> 
>>>>> So you are suggesting that I can install mesos + spark + docker
>> 
>>>>> and not a DFS on these (3) machines?
>> 
>> 
>> 
>>>>> Will I need any other softwares? My application is a geophysical
>> 
>>>>> fluid simulator, so scala, R, and all sorts of advanced math will
>> 
>>>>> be required on the cluster for the Finite Element Methods.
>> 
>> 
>> 
>>>>> James
>> 
>> 
> 
> -- 
> 
> -- 
> Cheers,
> Timothy St. Clair
> Red Hat Inc.

Re: Do i really need HDFS?

Reply via email to