Hi Alex,
Thanks for the reply. I have already created a logger (from
log4j.logger)and configured the same to log it to a file and it is
logging for all the log statements that I have in my client code.
However, the error/info logs of DFSClient are going to stdout. The
DFSClient code is using
Hi,
I have a 12 node cluster where instead of running a DN on each compute
node, I'm running just one DN backed by a large RAID (with a
dfs.replication of 1). The compute node storage is limited, so the
idea behind this was to free up more space for intermediate job data.
So the cluster has that o
M/R is performance is known to be better when using just a bunch of disks (BOD)
instead of RAID.
>From your setup it looks like your single datanode must be running hot on I/O
>activity.
The parameter- dfs.datanode.handler.count only control the number of datanode
threads serving IPC request.
Edson Ramiro wrote:
I'm not involved with Debian community :(
I think you are now...
Marcos Medrado Rubinelli wrote:
jps gets its information from the files stored under /tmp/hsperfdata_*,
so when a cron job clears your /tmp directory, it also erases these
files. You can submit jobs as long as your jobtracker and namenode are
responding to requests over TCP, though.
I never k
I set dfs.datanode.max.xcievers to 4096, but this didn't seem to have
any effect on performance.
Here are some benchmarks (not sure what typical values are):
- TestDFSIO - : write
Date & time: Tue Mar 30 04:53:18 EDT 2010
Number of files: 10
Total MBytes processed: 1
Ed Mazur wrote:
Hi,
I have a 12 node cluster where instead of running a DN on each compute
node, I'm running just one DN backed by a large RAID (with a
dfs.replication of 1). The compute node storage is limited, so the
idea behind this was to free up more space for intermediate job data.
So the
Hi,
Could some one kindly let me know if the DFSClient takes care of
datanode failures and attempt to write to another datanode if primary
datanode (and replicated datanodes) fail. I looked into the souce code
of DFSClient and figured out that it attempts to write to one of the
datanodes in p
Hej
I've checking the API and on internet but I have not found any method for
listing the subdirectories of a given directory in the HDFS.
Can anybody show me how to get the list of subdirectories or even how to
implement the method? (I guess that it should be possible and not very
hard).
Than
Hi all,
Does automatic restart and failover of the NameNode software to
another machine available in hadoop 0.20.2?
Does this get what you want ?
hadoop dfs -ls | grep drwx
On Tue, Mar 30, 2010 at 8:24 AM, Santiago Pérez wrote:
>
> Hej
>
> I've checking the API and on internet but I have not found any method for
> listing the subdirectories of a given directory in the HDFS.
>
> Can anybody show me how to get
I'm confused as to how to run a C++ pipes program on a full HDFS system. First
off, I have everything working in pseudo-distributed mode so that's a good
start...but full HDFS has no concept of an executable file (to the best of my
understanding, O'Reilly/White, p.47). I haven't even been succ
Please refer to highavailability contrib of 0.20.2:
HDFS-976
http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
On Tue, Mar 30, 2010 at 8:51 AM, 毛宏 wrote:
> Hi all,
> Does automatic restart and failover of the NameNode software to
> another machine available in
Hi Pallavi,
DFSClient uses log4j.properties for configuration. What is your classpath?
I need to know how exactly you invoke your program (java, hadoop script,
etc.). The log level and appender is driven by the hadoop.root.logger
config variable.
I would also recommend to use one logging syste
I'm confused as to how to run a C++ pipes program on a full HDFS system. I
have everything working in pseudo-distributed mode so that's a good start...but
I can't figure out the full cluster mode.
As I see it, there are two basic approaches: upload the executable directly to
HDFS or specify it
Please disregard this thread. I started another thread which is more specific
and pertinent to my problem...but if you have any helpful information, please
respond to the other thread. I need to get this figured out.
Thank you.
_
Hi all,
I 've noticed swapping for a single terasort job on a small 8-node
cluster using hadoop-0.20.1. The swapping doesn't happen repeatably; I
can have back to back runs of the same job from the same hdfs input
data and get swapping only on 1 out of 4 identical runs. I 've noticed
this swapping
If you were talking about looking at directories within a Java
program, here is what has worked for me.
FileSystem fs;
FileStatus[] fileStat;
Path[] fileList;
SequenceFile.Reader reader = null;
try{
// connect to the file system
fs = FileSystem.get(conf);
// get the stat on all fil
Apologies if you received multiple copies of this message.
=
CALL FOR PAPERS
5th Workshop on
Virtualization in High-Performance Cloud Computing
VHPC'10
as part of Euro-Par 2010, Island of Ischia-Naples, Italy
==
No responses yet, although I admit it's only been a few hours.
As a follow-up, permit me to pose the following question:
Is it, in fact, impossible to run C++ pipes on a fully-distributed system (as
opposed to a pseudo-distributed system)? I haven't found any definitive
clarification on this t
Hello.
Did you try following the tutorial in
http://wiki.apache.org/hadoop/C++WordCount ?
We use C++ pipes in production on a large cluster, and it works.
--gianluigi
On Tue, 2010-03-30 at 13:28 -0700, Keith Wiley wrote:
> No responses yet, although I admit it's only been a few hours.
>
> As
Yep, tried and tried and tried it. Works perfectly on a pseudo-distributed
cluster which is why I didn't think the example or the code was the problem,
but rather that the cluster was the problem.
I have only just (in the last two minutes) heard back from the administrator of
our cluster and h
My cluster admin noticed that there is some additional pipes package he could
add to the cluster configuration, but he admits to knowing very little about
how the C++ pipes component of Hadoop works.
Can you offer any insight into this cluster configuration package? What
exactly does it do tha
What are the symptoms?
Pipes should run out of the box in a standard installation.
BTW what version of bash are you using? Is it bash 4.0 by any chance?
See https://issues.apache.org/jira/browse/HADOOP-6388
--gianluigi
On Tue, 2010-03-30 at 14:13 -0700, Keith Wiley wrote:
> My cluster admin not
The closest I've gotten so far is for the job to basically try to start up but
to get an error complaining about the permissions on the executable
binary...which makes perfect sense since the permissions are not "executable".
Problem is, the hdfs chmod command ignores executable commands. For
$ hadoop fs -rmr HDFSPATH/output ; hadoop pipes -D
hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true
-input HDFSPATH/input -output HDFSPATH/output -program HDFSPATH/EXECUTABLE
Deleted hdfs://mainclusternn.hipods.ihost.com/HDFSPATH/output
10/03/30 14:56:55 WARN mapred.JobC
Hi All,
I am trying to get DFS IO performance.
I used TestDFSIO from hadoop jars.
The results were abt 100Mbps read and write .
I think it should be more than this
Pl share some stats to compare
Either I am missing something like config params or something else
-Sagar
Hi all,
Thanks for help Todd and Steve,
I configured Hadoop (0.20.2) again and I'm getting the same error (Function
not implemented).
Do you think it's a Hadoop bug?
This is the situation:
I've 28 nodes where just four are running the datanode.
In all other nodes the tasktracker in running ok
Hi Edson,
I noticed that only the h01 nodes are running 2.6.32.9, the other broken DNs
are 2.6.32.10.
Is there some reason you are running a kernel that is literally 2 weeks old?
I wouldn't be at all surprised if there were a bug here, or some issue with
your Debian "unstable" distribution...
-T
May be it's a bug.
I'm not the admin. : (
so, I'll talk to him and may be he install a 2.6.32.9 in another node to
test : )
Thanks
Edson Ramiro
On 30 March 2010 20:00, Todd Lipcon wrote:
> Hi Edson,
>
> I noticed that only the h01 nodes are running 2.6.32.9, the other broken
> DNs
> are 2.
Hi Sagar,
What hardware did you run it on ?
Edson Ramiro
On 30 March 2010 19:41, sagar naik wrote:
> Hi All,
>
> I am trying to get DFS IO performance.
> I used TestDFSIO from hadoop jars.
> The results were abt 100Mbps read and write .
> I think it should be more than this
>
> Pl share some
Hi,
Did all key-value pairs of the map output, which have the same key, will
be sent to the same reducer tasknode?
On Tue, Mar 30, 2010 at 9:56 PM, Cui tony wrote:
> Did all key-value pairs of the map output, which have the same key, will
> be sent to the same reducer tasknode?
Yes, this is at the core of the MapReduce model. There is one call to
the user reduce function per unique map output key. This groupi
yes ,indeed
在 2010-03-31三的 09:56 +0800,Cui tony写道:
> Hi,
> Did all key-value pairs of the map output, which have the same key, will
> be sent to the same reducer tasknode?
Something to keep in mind though, sorting is appropriate to the key type. Text
will be sorted lexicographically.
Nick Jones
- Original Message -
From: Ed Mazur
To: common-user@hadoop.apache.org
Sent: Tue Mar 30 21:07:29 2010
Subject: Re: question on shuffle and sort
On Tue, Mar 30, 2
Consider this extreme situation:
The input data is very large, and also the map result. 90% of map result
have the same key, then all of them will be sent to one reducer tasknode.
So 90% of work of reduce phase have to been done on a single node, not the
cluster. That is very ineffective and less s
I ran into an issue where lots of data was passing from mappers to a single
reducer. Enabling a combiner saved quite a bit of processing time by reducing
mapper disk writes and data movements to the reducer.
Nick Jones
- Original Message -
From: Cui tony
To: common-user@hadoop.apache.
Hi, Jones
As you have met the situation I am worried about, I got my answer now.
Maybe re-design the map function or add a combiner is the only way to deal
with this kind of input data .
2010/3/31 Jones, Nick
> I ran into an issue where lots of data was passing from mappers to a single
> reduce
hi, guys,
we have some machine with 1T disk, some with 100GB disk,
I have this question that is there any means we can limit the
disk usage of datanodes on those machines with smaller disk?
thanks!
Hello Steven ,
You can use dfs.datanode.du.reserved configuration value in
$HADOOP_HOME/conf/hdfs-site.xml for limiting disk usage.
dfs.datanode.du.reserved
182400
Reserved space in bytes per volume. Always leave this much
space free for non dfs use.
Ravi
Hadoop @ Yahoo
Hi all,
I find there is a directory "_log/history/..." under the output directory of a
mapreduce job. Is the file in that directory a log file? Is the information
there sufficient to allow me to figure out what nodes the job runs on? Besides,
not every job has such a directory. Is there such set
41 matches
Mail list logo