Right, so have you ever seen your non-idempotent DEFINE command have an
incorrect result? That would essentially point to duplicate attempts
behaving badly.
To your second question -- I think spec exec assumes that not all machines
run at the same speed. If a machine is free (not used for some oth
On 2/11/10 12:40 AM, Winton Davies wrote:
> ahhahhahahahahaha... I thought it was single-pass, and in this case, an
> 'echo'.
>
Yea, the combiner can be confusing at first. It may run N times where N
is zero or greater. And yes, this means that even if you supply a
combiner the framework may opt
ahhahhahahahahaha... I thought it was single-pass, and in this case, an
'echo'.
Thanks !
W
On Wed, Feb 10, 2010 at 8:05 PM, Eric Sammer wrote:
> Winton:
>
> The combiner is always optional. Simply leave it out to not have one. The
> reason you're seeing extra records is because a combiner can r
Winton:
The combiner is always optional. Simply leave it out to not have one.
The reason you're seeing extra records is because a combiner can run
multiple times. This means you're growing your dataset after the
mapper.
HTH
Eric
On Feb 10, 2010, at 10:30 PM, Winton Davies
wrote:
Hello Everyone,
We often find many child processes in datanodes, which have already
finished for long time. And following are the jstack log:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode):
"DestroyJavaVM" prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on
condition
Thanks Eric,
I think I may have found the cause of the problem, but have no idea how to
do fix it.
My mapper is STDOUT.puts "key1 tab key2 tab text" -- and the job tracker
shows the total number of records being emitted as
say 35 million.
it then goes thru -combiner /bin/cat (ie a NOOP, in theor
Hadoop Fans, we have scheduled additional developer sessions in both the bay
area and NYC. Also, due to popular demand, we'll be offering a public
sysadmin training session immediately following our March developer session
in the Bay Area. If this goes well, we'll make this a regular offering.
Als
Winton:
I don't know the exact streaming options you're looking for, but what
you have looks correct. Generally, to do what you want all you should
have to do is 1. sort on both field zero and one in the key and 2.
partition on only zero. This ensures all keys containing 'AA' go to the
same r
I'm using streaming hadoop, installed vua cloudera on ec2.
My job should be straightforward:
1) Map task, emits 2 keys and 1 VALUE
eg
AA 0 QUICK BROWN FOX
AA 1 QUICK BROWN FOX
BB 1 QUICK RED DOG
2) Reduce Task, assuming are all in its standard input and flag, runs
thru the stdin. Whe
On 2/10/10 5:19 PM, Nick Klosterman wrote:
@E.Sammer, no I don't *think* that it is part of another cluster. The
tutorial is for a single node cluster just as a initial set up to see if
you can get things up and running. I have reformatted the namenode
several times in my effort to get hadoop to
@E.Sammer, no I don't *think* that it is part of another cluster. The
tutorial is for a single node cluster just as a initial set up to see if
you can get things up and running. I have reformatted the namenode
several times in my effort to get hadoop to work.
@abishek
I tried the workaround y
On 2/10/10 3:57 PM, Nick Klosterman wrote:
It appears I have incompatible namespaceIDs. Any thoughts on how to
resolve that?
This is what the full datanodes log is saying:
Was this data node part of a another DFS cluster at some point? It looks
like you've reformatted the name node since the d
So Michael Noll's tutorial page has the following tips for the error
you are facing.
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)#java.io.IOException:_Incompatible_namespaceIDs
Abhishek
On Wed, Feb 10, 2010 at 12:57 PM, Nick Klosterman
wrote:
> It appears
It appears I have incompatible namespaceIDs. Any thoughts on how to
resolve that?
This is what the full datanodes log is saying:
2010-02-10 15:25:09,125 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG:
On 2/10/10 12:42 PM, "Nick Klosterman" wrote:
> I've been following Michael Noll's Single node cluster tutorial but am
> unable to run the wordcount example successfully.
>
> It appears that I'm having some sort of problem involving the nodes. Using
> copyFromLocal fails to replicate the dat
Nick:
It appears that the datanode daemon isn't running.
> /usr/local/hadoop/bin$ jps
> 24440 SecondaryNameNode
> 24626 TaskTracker
> 24527 JobTracker
> 24218 NameNode
> 24725 Jps
There's no process for DataNode. This is the process that is responsible
for storing blocks. In other words, no da
I've been following Michael Noll's Single node cluster tutorial but am
unable to run the wordcount example successfully.
It appears that I'm having some sort of problem involving the nodes. Using
copyFromLocal fails to replicate the data across 1 node.
When I try to look at the hadoop web inte
Thanks, that worked!
On Feb 10, 2010, at 11:44 AM, Alex Kozlov wrote:
David, to parse the -Dkey=value flags you need to implement Tool.
Otherwise, you can just set the values yourself using conf.set(name,
value)
call.
On Wed, Feb 10, 2010 at 11:25 AM, David Hawthorne
wrote:
For the ot
David, to parse the -Dkey=value flags you need to implement Tool.
Otherwise, you can just set the values yourself using conf.set(name, value)
call.
On Wed, Feb 10, 2010 at 11:25 AM, David Hawthorne wrote:
> For the other method I was using, with otherArgs and public static
> variables for field
For the other method I was using, with otherArgs and public static
variables for field_name and interval_length, here's the code for that:
public class FooBar {
public static class FooMapper extends MapperText, IntWritable> {
private final static IntWritable one = new I
On 2/10/10 12:15 AM, "Marcus Herou" wrote:
> We run hadoop-0.18.3 and it seems that the jobcache does not get cleaned out
> properly.
>
> Would this cron script be to any harm to hadoop ?
>
> # Clean all files which are two or more days old
> /usr/bin/find ${JOB_CACHE_PATH} -type f -mtime +2
I've tried what it shows in the examples, but those don't seem to be
working. Aside from that, they also complain about deprecated
interface when I compile. Any help you guys can give would be greatly
appreciated.
Here's what I need to do in the mapper:
Read through some logs.
Modulo the
Thomas Koch wrote:
I'm working on a hadoop package for Debian, which also includes init
scripts
using the daemon program (Debian package "daemon") from
http://www.libslack.org/daemon
Can these scripts be used on other distributions, like Red Hat? Or it's a
Debian only daemon?
I'm not familiar en
Correctness of the results actually depends on my DEFINE command. If
the commands are idempotent ( which is not in my case ) then I
believe it wont have any affect on the results, otherwise it will
indeed make the results incorrect. For example if my command fetches
some data and appends to a mys
That cleanup action looks promising in terms of preventing duplication. What
I'd meant was, could you ever find an instance where the results of your
DEFINE statement were made incorrect by multiple attempts?
On Wed, Feb 10, 2010 at 5:05 AM, prasenjit mukherjee <
pmukher...@quattrowireless.com> wr
> > I'm working on a hadoop package for Debian, which also includes init
> > scripts
> > using the daemon program (Debian package "daemon") from
> > http://www.libslack.org/daemon
>
> Can these scripts be used on other distributions, like Red Hat? Or it's a
> Debian only daemon?
I'm not familiar e
Below is the log :
attempt_201002090552_0009_m_01_0
/default-rack/ip-10-242-142-193.ec2.internal
SUCCEEDED
100.00%
9-Feb-2010 07:04:37
9-Feb-2010 07:07:00 (2mins, 23sec)
attempt_201002090552_0009_m_01_1
Task attempt: /default-rack/ip-10-212-147-129.ec2.internal
We faced the same issue and we also use the cron to delete the older
entries.
Please be careful that your mtime for deletion should never be less than the
longest job you can ever have.
On Wed, Feb 10, 2010 at 1:45 PM, Marcus Herou wrote:
> Hi.
>
> We run hadoop-0.18.3 and it seems that the jobc
may be by using multipleoutputFormat should solve my problem
thanks,.
On Wed, Feb 10, 2010 at 1:12 PM, Oded Rotem wrote:
> Did you try one of the subclasses of MultipleOutputFormat to override the
> filename in generateFileNameForKeyValue()?
>
> -Original Message-
> From: Mark N [mailto
Hi.
We run hadoop-0.18.3 and it seems that the jobcache does not get cleaned out
properly.
Would this cron script be to any harm to hadoop ?
# Clean all files which are two or more days old
/usr/bin/find ${JOB_CACHE_PATH} -type f -mtime +2 -exec rm {} \;
Need to start cleaning today so hoping f
30 matches
Mail list logo