Vinod,
There should be some stderr information on the task attempts' userlogs
that should help point out why your task launching is failing. It is
probably cause of something related to the JVM launch parameters (as
defined by mapred.child.java.opts).
If not there, look into the TaskTracker logs
This is definitely a map-increase job.
I could try a combiner, but I don't think that would help. My keys are small
compared to my values, and values must be kept separate when they are
accumulated in the reducer--they can't be combined into some smaller form,
i.e. they are more like bitmaps than
When in doubt, go straight to the owner of a fact. The operating system is
what really knows disk i/o.
"my mapper job--which may write multiple pairs for each one it
receives--is writing too many" - ah, a map-increase job :) This is what
Combiners are for- to keep explosions of data from hitting t
I have a problem where certain Hadoop jobs take prohibitively long to run.
My hypothesis is that I am generating more I/O than my cluster can handle
and I need to substantiate this. I am looking closely at the Map Reduce
framework counters because I think they contain the information I need, but
I
I just setup a pseudo-distributed hadoop setup. but when i run the example
task, i get failed child error. I see that this was posted earlier as well
but I didn't see the resolution.
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201108.mbox/%3cc30bf131a023ea4d976727cd4fc563fe0afbe...
Thanks to all.. especially from SD. that's exactly what i am looking for.
P
On Wed, Sep 28, 2011 at 11:20 PM, Simon Dong wrote:
> Or http://jobtracker:50030/conf
>
> -SD
>
> On Wed, Sep 28, 2011 at 2:39 PM, Raj V wrote:
> > The xml configuration file is also available under hadoop logs on the
hi,
Here is some useful info:
A small file is one which is significantly smaller than the HDFS block size
(default 64MB). If you’re storing small files, then you probably have lots of
them (otherwise you wouldn’t turn to Hadoop), and the problem is that HDFS
can’t handle lots of files.
Every
If you are running on EC2, you can use elastic map reduce. It has a startup
option where you specify the driver class in your jar, and it will run the
driver, I believe, on the namenode, which wont really add any overhead
because when the namenode is under stress, the driver will be sitting
quietl
On 29 September 2011 18:39, lessonz wrote:
> I'm new to Hadoop, and I'm trying to understand the implications of a 64M
> block size in the HDFS. Is there a good reference that enumerates the
> implications of this decision and its effects on files stored in the system
> as well as map-reduce jobs?
Yea, we don't want it to sit there waiting for the Job to complete, even if
it's just a few minutes.
--Aaron
-Original Message-
From: turboc...@gmail.com [mailto:turboc...@gmail.com] On Behalf Of John Conwell
Sent: Thursday, September 29, 2011 10:50 AM
To: common-user@hadoop.apache.org
Su
After you kick off a job, say JobA, your client doesn't need to sit and ping
Hadoop to see if it finished before it starts JobB. You can have the client
block until the job is complete with "Job.waitForCompletion(boolean
verbose)". Using this you can create a "job driver" that chains jobs
togethe
I'm new to Hadoop, and I'm trying to understand the implications of a 64M
block size in the HDFS. Is there a good reference that enumerates the
implications of this decision and its effects on files stored in the system
as well as map-reduce jobs?
Thanks.
FileSystem objects will be cached in jvm.
When it tries to get the FS object by using Filesystem.get(..) ( sequence file
internally will use it), it will return same fs object if scheme and authority
is same for the uri.
fs cache key's equals implementation is below
static boolean isEqual(Obj
I would definitely checkout Oozie for this use case.
-Joey
On Thu, Sep 29, 2011 at 12:51 PM, Aaron Baff wrote:
> I saw this, but wasn't sure if it was something that ran on the client and
> just submitted the Job's in sequence, or if that gave it all to the
> JobTracker, and the JobTracker too
Do you close your FileSystem instances at all? IIRC, the FileSystem
instance you use is a singleton and if you close it once, it's closed
for everybody. My guess is you close it in your cleanup method and you
have JVM reuse turned on.
-Joey
On Thu, Sep 29, 2011 at 12:49 PM, Mark question wrote:
I saw this, but wasn't sure if it was something that ran on the client and just
submitted the Job's in sequence, or if that gave it all to the JobTracker, and
the JobTracker took care of submitting the Jobs in sequence appropriately.
Basically, I'm looking for a completely stateless client, that
On 29/09/11 13:28, Brian Bockelman wrote:
On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote:
Hi,
I want to know can we use SAN storage for Hadoop cluster setup ?
If yes, what should be the best pratices ?
Is it a good way to do considering the fact "the underlining power of Hadoop
is co-lo
On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote:
> Hi,
>
> I want to know can we use SAN storage for Hadoop cluster setup ?
> If yes, what should be the best pratices ?
>
> Is it a good way to do considering the fact "the underlining power of Hadoop
> is co-locating the processing power (CP
On 28/09/11 22:45, Sameer Farooqui wrote:
Hi everyone,
I'm looking for some recommendations for how to get our Hadoop cluster to do
faster I/O.
Currently, our lab cluster is 8 worker nodes and 1 master node (with
NameNode and JobTracker).
Each worker node has:
- 48 GB RAM
- 16 processors (Inte
Our Hadoop journey included a brief stint running on our own virtualised
infrastructure. Our pre-Hadoop application was already running on the VM
infrastructure so we set up a small cluster as virtual machines on the SAN.
It worked ok for a while but as our usage grew we ditched it for a couple
Are you using hadoop-0.21.0 (may be unstable relaease) ? Using 0.20.204 or 0.22
would be better.
>> "WARN rumen.TraceBuilder: File skipped: Invalid file name:
job_201109221644_0001_"
This means : Rumen is assuming that the jobhistory file name format to be
something else --- may be without "_us
Hi All,
I am using the hadoop in distributed mode.
I want to know there is any option to move the fsimage and editlog files from
namenode to other machine except NFS. ?
If we reduce the checkpoint time , what are the drawbacks occur in future ?
Thanks & Regards
R.Shanmuganatha
22 matches
Mail list logo