aren't allowed for any given file.
Thanks,
+Vinod Kumar Vavilapalli
On May 17, 2013, at 6:40 AM, John Lilley wrote:
Thanks! Does this also imply that multiple clients may open the same HDFS file
for append simultaneously, and expect append requests to be interleaved?
john
From: Arpit Agarwal
Have you looked at HDP for Windows?
http://hortonworks.com/download/
It is a 1.1-based distro and is designed for easier Windows install. I haven't
used it myself.
john
From: Cheng, Yi [mailto:yi.ch...@hp.com]
Sent: Friday, May 17, 2013 5:41 PM
To: user@hadoop.apache.org
Subject: which hadoop
We will be programming to the YARN resource manager and scheduler in an
upcoming project, but I am unclear regarding its level of integration in each
version.
Searching for, say, ApplicationSubmissionContext in versions of Hadoop, I see
it in 0.23 and 2.0, but not in 1.0 or 1.1. Does this
I am attempting to distribute the execution of a C-based program onto a Hadoop
cluster, without using MapReduce. I read that YARN can be used to schedule
non-MapReduce applications by programming to the ASM/RM interfaces. As I
understand it, eventually I get down to specifying each sub-task
I seem to recall reading that when a MapReduce task writes a file, the blocks
of the file are always written to local disk, and replicated to other nodes.
If this is true, is this also true for non-MR applications writing to HDFS from
Hadoop worker nodes? What about clients outside of the
From: Ted Yu [mailto:yuzhih...@gmail.com]
Subject: Re: Is FileSystem thread-safe?
FileSystem is an abstract class, what concrete class are you using
(DistributedFileSystem, etc) ?
Good point. I am calling FileSystem.get(URI uri, Configuration conf) with an
URI like hdfs://server:port/... on a
From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com]
Subject: Why big block size for HDFS.
Many places it has been written that to avoid huge no of disk seeks , we store
big blocks in HDFS , so that once we seek to the location , then there is only
data transfer rate which would be
Are there standard approaches for setting up a Hadoop cluster quickly by
cloning most of the data nodes?
How does one schedule mappers to read MongoDB or HBase in a data-locality-aware
fashion?
-john
From: Mohammad Tariq [mailto:donta...@gmail.com]
Sent: Wednesday, January 16, 2013 3:29 AM
To: user@hadoop.apache.org
Subject: Re: Query mongodb
Yes. You can use MongoDB-Hadoop adapter to achieve
the jobs on the nodes
where data is located. It is its fundamental nature. You don't have
to do anything extra.
*I am sorry if I misunderstood the question.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.comhttp://cloudfront.blogspot.com
On Wed, Jan 16, 2013 at 8:10 PM, John
I think it will help for Ouch to clarify what is meant by in order. If one
JSON file must be completely processed before the next file starts, there is
not much point to using MapReduce at all, since your problem cannot be
partitioned. On the other hand, there may be ways around this, for
I think there's quite a few people like me here asking basic questions on the
user@ group.
From: Monkey2Code [mailto:monkey2c...@gmail.com]
Sent: Sunday, January 13, 2013 2:23 PM
To: gene...@hadoop.apache.org; user@hadoop.apache.org
Subject: request on behalf of newbies
Hi all,
Am a newbie in
I am trying to understand how one can make a side process cooperate with the
Hadoop MapReduce task scheduler. Suppose that I have an application that is
not directly integrated with MapReduce (i.e., it is not a MapReduce job at all;
there are no mappers or reducers). This application could
on JNI for that?
John
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Saturday, January 12, 2013 9:41 AM
To: user@hadoop.apache.org
Subject: Re: Scheduling non-MR processes
Hi,
Inline.
On Sat, Jan 12, 2013 at 9:39 PM, John Lilley john.lil...@redpoint.net wrote:
I am
We are somewhat new to Hadoop and are looking to run some experiments with
HDFS, Pig, and HBase.
With that in mind, I have a few questions:
What is the easiest (preferably free) Hadoop distro to get started with?
Cloudera?
What host OS distro/release is recommended?
What is the easiest
It depends. What data is going into the table, and what keys will drive the
lookup?
Let's suppose that you have a single JSON file that has some reasonable number
of key/value tuples. You could easily load a Hashtable to associate the
integer keys with the values (which appear to be lists of
If you like RedHat, consider Centos also; it is a nearly-complete clone of the
RHEL distro.
John
From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Friday, January 04, 2013 10:46 AM
To: user@hadoop.apache.org
Subject: Re: Hello and request some advice.
- Is Ubuntu a good O.S. for running
perspective of Hadoop?
--- On Fri, 4/1/13, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
From: John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net
Subject: RE: Hello and request some advice.
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
user
throughput question
Hadoop is using OneFS, not HDFS in our configuration. Isilon NAS and the Hadoop
nodes are in the same datacenter but as far as rack locations, I cannot tell.
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Thursday, January 03, 2013 5:15 PM
To: user
Perhaps if Artem posted the presumably-simple code we could get other users to
benchmark other 4-node systems and compare.
--John Lilley
Artem Ervits are9...@nyp.org wrote:
Setting the property to 64k made the throughput jump to 36mb/sec, 39mb for 128k.
Thank you for the tip.
From: Michael
201 - 220 of 220 matches
Mail list logo