Hi all,
I would like to give data locality. In other words, I want to place
certain data blocks on one machine. In some problems, subsets of an
entire dataset need one another for answer. Most of the graph problems
are good examples.
Is it possible? If impossible, can you advice about that?
Hi.
Thank for the advice, just to clarify:
The upgrade of you speak of of cleaning the pipes/epolls more often, is
regarding the issue discussed (HADOOP-4346, fixed in my distribution), or
it's some other issue?
If yes, does it has a ticket I can see, or it should be filled to Jira?
Thanks!
when i try to execute the command bin/start-dfs.sh , i get the following
error . I have checked the hadoop-site.xml file on all the nodes , and they
are fine ..
can some-one help me out!
10.2.24.21: Exception in thread main java.net.UnknownHostException:
unknown host: 10.2.24.21.
10.2.24.21:
fs.default.name in your hadoop-site.xml needs to be set to a fully-
qualified domain name (instead of an IP address)
-Matt
On Jun 23, 2009, at 6:42 AM, bharath vissapragada wrote:
when i try to execute the command bin/start-dfs.sh , i get the
following
error . I have checked the
This is at RPC client level and there is requirement for fully qualified
hostname. May be . at the end of 10.2.24.21 causing the problem?
btw, in 0.21 even fs.default.name does not need to be fully qualified
name.. anything that resolves to an ipaddress is fine (at least for
common/FS and
Hello,
I'm running a 90 node c1.xlarge cluster. No reducers, mapred.max.map.tasks=6
per machine.
The AMI is own and uses Hadoop 0.19.1
The dataset has 145K keys, and the processing time is huge.
Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map
tasks, across the machines.
Hello,
I should also point out that I'm using a SequenceFileInputFormat.
Regards
Saptarshi Guha
On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
saptarshi.g...@gmail.comwrote:
Hello,
I'm running a 90 node c1.xlarge cluster. No reducers,
mapred.max.map.tasks=6 per machine.
The AMI is own
I encountered this problem before. If t you can ping the machine using its
name, but cannot ping it using its IP address.
then what you have to do is add the mapping into /etc/hosts
-Original Message-
From: bharathvissapragada1...@gmail.com
[mailto:bharathvissapragada1...@gmail.com]
Raghu Angadi wrote:
This is at RPC client level and there is requirement for fully qualified
I meant to say there is NO requirement ...
hostname. May be . at the end of 10.2.24.21 causing the problem?
btw, in 0.21 even fs.default.name does not need to be fully qualified
that fix is
Stas Oskin wrote:
Hi.
Any idea if calling System.gc() periodically will help reducing the amount
of pipes / epolls?
since you have HADOOP-4346, you should not have excessive epoll/pipe fds
open. First of all do you still have the problem? If yes, how many
hadoop streams do you have at a
Hi Hyunsik,
Unfortunately you can't control the servers that blocks go on. Hadoop does
block allocation for you, and it tries its best to distribute data evenly
among the cluster, so long as replicated blocks reside on different
machines, on different racks (assuming you've made Hadoop
It worked fine when i updated /etc/hosts file (of all the slaves) and
writing fully qualified domain name in the hadoop-site.xml.
It worked fine for sometime .. then started giving new error
09/06/23 22:21:49 INFO ipc.Client: Retrying connect to server: master/
10.2.24.21:54310. Already tried 0
namenode is stopping automatically!!
On Tue, Jun 23, 2009 at 10:29 PM, bharath vissapragada
bharathvissapragada1...@gmail.com wrote:
It worked fine when i updated /etc/hosts file (of all the slaves) and
writing fully qualified domain name in the hadoop-site.xml.
It worked fine for sometime
Do you use block compression in sequence file? How large is your total
dataset?
On Jun 23, 2009, at 7:50 AM, Saptarshi Guha wrote:
Hello,
I should also point out that I'm using a SequenceFileInputFormat.
Regards
Saptarshi Guha
On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
Jason, do you know offhand when this feature was introduced? .18.x?
Thanks,
Bo
On Mon, Jun 22, 2009 at 10:58 PM, jason hadoopjason.had...@gmail.com wrote:
Check the process environment for your streaming tasks, generally the
configuration variables are exported into the process environment.
When I run map reduce task over a har file as the input, I see that the
input splits refer to 64mb byte boundaries inside the part file.
My mappers only know how to process the contents of each logical file inside
the har file. Is there some way by which I can take the offset range
specified by
To be more accurate, once you have HADOOP-4346,
fds for epoll and pipes = 3 * threads blocked on Hadoop I/O
Unless you have hundreds of threads at a time, you should not see
hundreds of these. These fds stay up to 10sec even after the
threads exit.
I am a bit confused about your exact
Greetings,
I've gotten a few replies on this, but I'd really like to know who
else is coming. Just send me a quick note :)
Cheers,
Bradford
On Mon, Jun 22, 2009 at 5:40 PM, Bradford
Stephensbradfordsteph...@gmail.com wrote:
Hey all, just a friendly reminder that this is Wednesday! I hope to
Hi.
In my testings, I typically opened between 20 and 40 concurrent streams.
Regards.
2009/6/23 Raghu Angadi rang...@yahoo-inc.com
Stas Oskin wrote:
Hi.
Any idea if calling System.gc() periodically will help reducing the amount
of pipes / epolls?
since you have HADOOP-4346, you should
how many threads do you have? Number of active threads is very
important. Normally,
#fds = (3 * #threads_blocked_on_io) + #streams
12 per stream is certainly way off.
Raghu.
Stas Oskin wrote:
Hi.
In my case it was actually ~ 12 fd's per stream, which included pipes and
epolls.
Could it
Hi.
So if I open one stream, it should be 4?
2009/6/23 Raghu Angadi rang...@yahoo-inc.com
how many threads do you have? Number of active threads is very important.
Normally,
#fds = (3 * #threads_blocked_on_io) + #streams
12 per stream is certainly way off.
Raghu.
Stas Oskin wrote:
Is there a way to access stderr when using Hadoop Streaming? I see how
stdout is written to the log files but I'm more concerned about what happens
when errors occur. Access to stderr would help debug when a run doesn't
complete successfully but I haven't been able to figure out how to retrieve
S D wrote:
Is there a way to access stderr when using Hadoop Streaming? I see how
stdout is written to the log files but I'm more concerned about what happens
when errors occur. Access to stderr would help debug when a run doesn't
complete successfully but I haven't been able to figure out how
In my Hadoop cluster, I've had several drives fail lately (and they've
been replaced). Each time a new empty drive is placed in the cluster,
I run the balancer.
I understand that the balancer will redistribute the load of file
blocks across the nodes.
My question is: will balancer also look at
Hi all,
Is there any way to tell, from logs, or by reading/setting a counter, whether a
particular mapper was data local, i.e., it ran on the same node as its input
data?
Thanks,
Suratna
(Correct me if I'm wrong), but I think you can tell though the Hadoop
Web UI -- it'll show a count of which map tasks are data-local. You
can then click on that to see a list of all the tasks there, and drill
down to see which nodes those tasks ran on.
On Tue, Jun 23, 2009 at 6:37 PM, Suratna
I can¹t make it this time, I¹m out of town.
On 6/23/09 12:53 PM, Bradford Stephens bradfordsteph...@gmail.com wrote:
Greetings,
I've gotten a few replies on this, but I'd really like to know who
else is coming. Just send me a quick note :)
Cheers,
Bradford
On Mon, Jun 22, 2009 at
Thanks Jason!
I gave your suggestion to my cluster administrator and now it is working.
Following was his reply to me:
But /hadoop/tmp is not /scratch and the only thing that I clean is
/scratch. It looks like the disks in the job tracker machine died. I
swapped the disks from another node
The namenode is constantly receiving reports about what datanode has what
blocks, and performing replication when a block becomes under replicated.
On Tue, Jun 23, 2009 at 6:18 PM, Stuart White stuart.whi...@gmail.comwrote:
In my Hadoop cluster, I've had several drives fail lately (and they've
I happened to have a copy of 18.1 lying about, and the JobConf is added to
the per process runtime environment in 18.1.
The entire configuration from the JobConf object is added to the
environment, with the jobconf key names being transformed slightly. Any
character in the key name, that is not
30 matches
Mail list logo