Hi Praveenesh,
The NN will send list of DN to the client in sorted order (nodes nearer
to client are first in the list).
If one DN takes more time hadoop has a mechanism to detect that -
Speculative execution.
Speculative execution: One problem with the Hadoop system is that by
dividing the tasks
Okay so I have one question in mind.
Suppose I have a replication factor of 3 on my cluster of some N
nodes, where N>3 and there is a data block B1 that exists on some 3
Data nodes --> DD1, DD2, DD3.
I want to run some Mapper function on this block.. My JT will
communicate with NN, to know where
Hai guys !
I have set up 5 node cluster with each of them in different racks.
I have hadoop-0.20.2 set up on my Eclipse Helios. So, i ran Tracebuilder
using
Main Class: org.apache.hadoop.tools.rumen.TraceBuilder
I ran some jobs on cluster and used copy of /usr/local/hadoop/logs/history
folder o
You could run the flume collectors on other machines and write a source which
connects to the sockets on the data generators.
-Joey
On Dec 15, 2011, at 21:27, "Periya.Data" wrote:
> Sorry...misworded my statement. What I meant was that the sources are meant
> to be untouched and admins do
Sorry...misworded my statement. What I meant was that the sources are meant
to be untouched and admins do not want to mess with it and add more tools
in there. All I've got is source addresses, port numbers. Once I know what
technique(s) I will be using, accordingly, I will be given access via
fire
Just curious - what is the situation you're in where no collectors are
possible? Sounds interesting.
Russell Jurney
twitter.com/rjurney
russell.jur...@gmail.com
datasyndrome.com
On Dec 15, 2011, at 5:01 PM, "Periya.Data" wrote:
> Hi all,
> I would like to know what options I have to ingest
Hi all,
I would like to know what options I have to ingest terabytes of data
that are being generated very fast from a small set of sources. I have
thought about :
1. Flume
2. Have an intermediate staging server(s) where you can offload data and
from there use dfs -put to load into H
Joey
What is it do you really want to do? Increase the number of map slots available
in the task tracker or do you want to increase the number of map tasks for a
job?
If you want to increase the number of map slots available, what you did will
work - as long as you restarted the task tracker
(moving to mapreduce-user@, bcc'ing common-user@)
Hi Joey -
You'll want to change the value on all of your servers running tasktrackers
and then restart each tasktracker to reread the configuration.
cheers,
-James
On Thu, Dec 15, 2011 at 3:30 PM, Joey Krabacher wrote:
> I have looked up how to
I have looked up how to up this value on the web and have tried all
suggestions to no avail.
Any help would be great.
Here is some background:
Version: 0.20.2, r911707
Compiled: Fri Feb 19 08:07:34 UTC 2010 by chrisdo
Nodes: 5
Current Map Task Capacity : 10 <--- this is what I want to increase
Hi,
I am using 0.20.X branch. However, I need to use the new API because it
has the cleanup(context) method in Mapper. However, I am confused about
how to load the cached files in mapper. I could load the
DistributedCache files using old API (JobConf), but in new API it
always returns nu
mapred.map.tasks is a suggestion to the engine and there is really no reason to
define it as it will be driven by the block level partitioning of your files
(e.g. if you have a file that is 30 blocks then it will by default spawn 30 map
tasks). As for mapred.reduce.tasks, just set it to whatever
thanks matt,
Assuming therefore i run a single tasktracker and have 48 cores
available. Based on your recommendation of 2:1 mappers to reducer
threads i will be assigning:
mapred.tasktracker.map.tasks.maximum=30
mapred.tasktracker.reduce.tasks.maximum=15
This brings me onto my question:
"Ca
Dale,
Talking solely about hadoop core you will only need to run 4 daemons on that
machine: Namenode, Jobtracker, Datanode and Tasktracker. There is no reason to
run multiple of any of them as the tasktracker will spawn multiple child jvms
which is where you will get your task parallelism. When
Tom,
Look,
I've said this before and I'm going to say it again.
Your knowledge of Hadoop is purely academic. It may be ok to talk to C level
execs who visit the San Jose IM Lab or in Markham, but when you give answers on
issues you don't have first hand practical experience, you end up doing
Hi all
New to the community and using hadoop and was looking for some advice as
to optimal configurations on very large servers. I have a single server
with 48 cores and 512GB of RAM and am looking to perform an LDA analysis
using Mahoot across approx 180 million documents. I have configured
Hello,
I am trying to package some config data out to my mappers. Was just testing
while running locally, and I can't get anything to work for me.
~/hadoop/hadoop-0.20.204.0/bin/hadoop jar the_jar.jar com.bar.ApplyMappings
-files data/config_stuff.txt#config_stuff.txt input_dir output_dir
confi
Hi,
Are there any standard procedures for replacing failed nodes from a cluster?
Regards Hans-Peter
Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd
voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij
u dit onmiddellijk aan ons te me
Hi Todd,
you are right I should be more specific:
1. from the namenode log:
2011-12-11 08:57:23,245 WARN
org.apache.hadoop.hdfs.server.common.Storage: rollEdidLog: removing
storage /srv/hadoop/hdfs/edit
2011-12-11 08:57:23,311 WARN
org.apache.hadoop.hdfs.server.common.Storage: incrementCheckpo
Hi Guy,
Several questions come to mind here:
- What was the exact WARN level message you saw?
- Did you have multiple dfs.name.dirs configured as recommended by
most setup guides?
- Did you try entering safemode and then running saveNamespace to
persist the image before shutting down the NN? This
Hi guys,
We recently had the following problem on our production cluster:
The filesystem containing the editlog and fsimage had no free inodes.
As a result the namenode wasn't able to obtain an inode for the
fsimage and editlog after a checkpiot has been reached, while the
previous files we
21 matches
Mail list logo