On Jul 18, 2011, at 12:53 PM, Ben Clay wrote:
> I'd like to spread Hadoop across two physical clusters, one which is
> publicly accessible and the other which is behind a NAT. The NAT'd machines
> will only run TaskTrackers, not HDFS, and not Reducers either (configured
> with 0 Reduce slots). T
On Jul 12, 2011, at 4:34 PM,
wrote:
> I am working on deploying Hadoop on a small cluster. For now, I am interested
> in
> restarting (restart the node or even reboot the OS) the nodes Hadoop detects
> as
> crashed.
There are quite a few scenarios where one service may be up but an
On Jul 12, 2011, at 3:02 PM,
wrote:
> I am new to Hadoop, and I apologies if this was answered before, or if this
> is
> not the right list for my question.
common-user@ would likely have been better, but I'm too lazy to forward
you there today. :)
>
> I am trying to do the followi
On Jul 12, 2011, at 10:27 AM, Virajith Jalaparti wrote:
> I agree that the scheduler has lesser leeway when the replication factor is
> 1. However, I would still expect the number of data-local tasks to be more
> than 10% even when the replication factor is 1.
How did you load your data?
On Jun 30, 2011, at 12:36 PM, David Ginzburg wrote:
>
> Is it possible though the server runs with vm.swappiness =5
That only controls how aggressive the system swaps. If you eat all the
RAM in user space, the system is going to start paging memory regardless of
swappiness.
On Jun 28, 2011, at 1:43 PM, Peter Wolf wrote:
> Hello all,
>
> I am looking for the right thing to read...
>
> I am writing a MapReduce Speech Recognition application. I want to run many
> Speech Recognizers in parallel.
>
> Speech Recognizers not only use a large amount of processor, they
On Jun 28, 2011, at 6:19 AM, Jeremy Cunningham wrote:
> I have lots of binary files stored in hdfs. I read them using Apache POI and
> can search with no problems. I want to be able to search for keywords (which
> I can do) and then copy the file that has the text out to a different
> locatio
On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:
>
> Hi,
> I am running a certain job which constantly cause dead data nodes (who come
> back later, spontaneously ).
Check your memory usage during the job run. Chances are good the
DataNode is getting swapped out.
On Jun 22, 2011, at 2:20 PM, Jonathan Zukerman wrote:
>
> That way I can set the maximum number of maps and maximum number of reducers
> in the configuration of the job ("loadmanager.maximum.maps.per.tasktracker"
> will be 1 for these special jobs).
> Am I right? Am I missing something?
On Jun 23, 2011, at 7:09 AM, Virajith Jalaparti wrote:
> Hi,
>
> I am trying to run a sort job (from hadoop-0.20.2-examples.jar) on 50GB of
> data (generated using randomwriter). I am using hadoop-0.20.2 on a cluster
> of 3 machines with one machine serving as the master and the other two as
> s
On Jun 22, 2011, at 10:08 AM, Allen Wittenauer wrote:
>
> On Jun 21, 2011, at 2:02 PM, Harsh J wrote:
>>>>
>>>> If your jar does not contain code changes that need to get transmitted
>>>> every time, you can consider placing them on the JT/TT classpa
On Jun 21, 2011, at 2:02 PM, Harsh J wrote:
>>>
>>> If your jar does not contain code changes that need to get transmitted
>>> every time, you can consider placing them on the JT/TT classpaths
>>
>>... which means you get to bounce your system every time you change
>> code.
>
> Its ugl
On Jun 20, 2011, at 12:24 PM,
wrote:
> Hi there,
> I know client can send "mapred.reduce.tasks" to specify no. of reduce tasks
> and hadoop honours it but "mapred.map.tasks" is not honoured by Hadoop. Is
> there any way to control number of map tasks? What I noticed is that Hadoop
> is choo
On Jun 21, 2011, at 9:52 AM, Jonathan Zukerman wrote:
> Hi,
>
> Is there a way to set the maximum map tasks for all tasktrackers in my
> cluster for a certain job?
> Most of my tasktrackers are configured to handle 4 maps concurrently, and
> most of my jobs don't care where does the map function
On Jun 3, 2011, at 1:11 AM, Felix Sprick wrote:
> Hi,
>
> We are running MapReduce on Hbase tables and are trying to implement a
> scenario with MapReduce where tasks are submitted from a GUI application.
> This means that several users (currently 5-10) may use the system in
> parallel.
On Apr 1, 2011, at 12:05 PM, Travis Crawford wrote:
> On Thu, Mar 31, 2011 at 3:25 PM, Allen Wittenauer wrote:
>>
>> On Mar 31, 2011, at 11:45 AM, Travis Crawford wrote:
>>
>>> Is anyone familiar with how the distributed cache deals when datasets
>>&g
On Mar 31, 2011, at 11:45 AM, Travis Crawford wrote:
> Is anyone familiar with how the distributed cache deals when datasets
> larger than the total cache size are referenced? I've disabled the job
> that caused this situation but am wondering if I can configure things
> more defensively.
On Mar 25, 2011, at 10:09 AM, Pedro Costa wrote:
> Hi,
>
> during the setup phase and the cleanup phase of the tasks, the Hadoop
> MR uses map tasks to do it. These tasks appears in the counters shown
> at the end of an example?
> For example, the counter below shows that my example ran 9 map ta
On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
> I am not sure if this is the right listserv, forgive me if it is not.
A better choice would likely be hdfs-user@, since this is really about
watching files in HDFS.
> My
> goal is this: monitor HDFS until a file is create, and th
On Jan 4, 2011, at 10:30 AM, Hiller, Dean (Contractor) wrote:
> I guess I meant in the setting for number of tasks in child JVM before
> teardown. In that case, it is nice to separate/unload my previous
> classes from the child JVM which OSGi does. I was thinking we may do 10
> tasks / JVM sett
On Jan 3, 2011, at 5:11 AM, Debbie Fu wrote:
> I think it will cause a disk fill-up, too. Is there any mechanism in Hadoop
> that handles this situation?
Not in a way that saves the job.
> If my local disk stores too much chunk data,
> and spare little space for intermediate output, and
On Jan 2, 2011, at 9:51 AM, Hiller, Dean (Contractor) wrote:
> I was looking at distributed cache and how I need to copy local jars to
> hdfs. I was wondering if there was any plans to just deploy an OSGi
> bundle(ie. Introspect and auto deploy jars from bundle to the
> distributed cache and the
This makes sense until you realize:
a) It won't scale.
b) Machines fail.
On Dec 20, 2010, at 5:26 AM, Martin Becker wrote:
> I wrote a little bit much, so I put a summary up front. Sorry about that.
>
> Summary:
> 1) Is there any point in time, where on
On Dec 19, 2010, at 7:39 AM, Martin Becker wrote:
> Hello everybody,
>
> is there a possibility to make sure that certain/all reduce tasks,
> i.e. the reducers to certain keys, are executed in a specified order?
> This is Job internal, so the Job Scheduler is probably the wrong place to
> start
On Dec 19, 2010, at 10:21 AM, Eric wrote:
> I don't know if there is such a thing in Hadoop, I'm guessing not since
> MapReduce is designed to have independent mappers and reducers.
Yup.
> I'm just suggesting something here: you could write a small server yourself.
> Say you start yo
On Dec 19, 2010, at 10:23 AM, Jane Chen wrote:
> Suppose that the output is written to a database, that only runs on certain
> nodes. It will be desirable to schedule the reducer tasks to run on the
> nodes local or close to the database nodes.
a) That's a side-effect--pretty much "a
On Dec 7, 2010, at 6:47 PM, Harsh J wrote:
>> 1 - When we've two JobTrackers running simultaneously, each JobTracker is
>> running in a separate process?
>
> You can't run simultaneous JobTrackers for the same data-cluster
> AFAIK; only one JT process can exist. Did you mean jobs?
Sure y
On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:
> When running map reduce tasks in Hadoop I run into classpath issues. Contrary
> to previous posts, my problem is not that I am missing classes on the Task's
> class path (we have a perfect solution for that) but rather find too many
> (e.g. E
On Sep 9, 2010, at 11:42 AM, Elton Pinto wrote:
> Does anyone know the difference between the Hadoop counter
> TOTAL_LAUNCHED_MAPS and the "mapred.map.tasks" parameter available in the
> JobConf?
mapred.map.tasks is what Hadoop thinks you need at a minimum.
TOTAL_LAUNCHED_MAPS will be all ma
On Aug 9, 2010, at 1:27 PM, Pedro Costa wrote:
>
> 2 - If I'm deducting correctly, the reduce will always fetch 10 bytes
> less than the saved map output?
Why do you care?
On Jul 14, 2010, at 11:50 AM, Shaojun Zhao wrote:
> Is there any way to specify that some machines run, say, 8 mapper
> tasks, while some machines run only 2 tasks?
A custom mapred-site.xml per machine.
On Jun 23, 2010, at 3:13 PM, Pedro Costa wrote:
> 1 - Hadoop uses several ports to run. It exists ports for HDFS, for the
> MapReduce JvmTasks, etc. I don't know how I can identify all the ports that a
> MapReduce and HDFS uses. I'm running the wordcount example, and I would like
> to see what
On Jun 22, 2010, at 1:58 PM, Steve Lewis wrote:
> train...@hadoop1:~$ hadoop dfsadmin -safemode get
> Safe mode is OFF
OK, so you are out of safemode.
>
> train...@hadoop1:~$ hadoop dfsadmin -refreshNodes
This just re-reads the list of nodes. hadoop dfsadmin -report might be more
useful.
On Jun 22, 2010, at 12:55 PM, Steve Lewis wrote:
> /user/training/small_yeast/yeast_chrXIV0006.sam.gz could only be
> replicated to 0 nodes, instead of 1
... almost always means the namenode doesn't think it has any viable datanodes
(anymore).
> Anyone seen this and know how to fix it
> I
On Jun 3, 2010, at 1:45 AM, Alex Munteanu wrote:
> I am running several different mapreduce jobs. For some of them it is
> better to have a rather high number of running map tasks per node,
> whereas others do very intensive read operations on our database
> resulting in read timeouts. So for thes
On May 28, 2010, at 11:43 AM, Todd Lipcon wrote:
> Hi Allen,
>
> Recent versions of the fair scheduler have configurations for "delay
> scheduling" - essentially, it will wait for a few seconds when a slot opens
> up to try to find a local task before assigning a non-local one. This is
> spec
I've been thinking (which is always a dangerous thing) about data
locality lately.
If we look at file systems, there is this idea of 'reserved space'.
This space is used for a variety of reasons, including to reduce fragmentation
on busy file systems. This allows the file s
On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote:
> For example, when I run "wordcount" example, there is HDFS communications and
> MapReduce communications and I am not able to distinguish which packet belong
> to HDFS or to MapReduce.
This shouldn't be too surprising given that the MapReduce j
On Apr 2, 2010, at 11:44 PM, Raja Thiruvathuru wrote:
>
> DistributedCache.addCacheFile(new
> URI("hdfs://localhost:9000/user/guest/lib/userlib.jar"), conf);
> DistributedCache.addArchiveToClassPath(new
> Path("hdfs://localhost:9000/user/guest/lib/userlib.jar"), conf);
localhos
On 3/31/10 8:12 PM, "Zhanlei Ma" wrote:
> But how to Recommission? Wish your help.
Take them out of dfs.exclude and refreshnodes again.
On 3/11/10 11:05 AM, "Gregory Lawrence" wrote:
> Is there a way to set the output group for a mapreduce (or hdfs fs operation)
> job? For example -Ddfs.umaskmode=027 successfully sets the permissions. I
> would think the -Dgroup.name=GROUP would do a similar thing for the file's
> group. Howeve
l file.
>
> Cheers,
>
> Teryl
>
>
> On Tue, Jan 19, 2010 at 4:32 PM, Allen Wittenauer
> wrote:
>
>> What is the value of:
>>
>> mapred.tasktracker.map.tasks.maximum
>> mapred.tasktracker.reduce.tasks.maximum
>>
>>
>> On 1
What is the value of:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
On 1/19/10 10:23 AM, "Teryl Taylor" wrote:
> Hi guys,
>
> Thanks for the answers. Michael, yes you are right, that is what I guess,
> I'm looking for...how to reduce the number of mappers runn
On 10/27/09 3:34 AM, "tim robertson" wrote:
> to create file /user/root/delme2/resource-101 for
I wouldn't recommend running your grid/jobs as root. :)
44 matches
Mail list logo