wmitchell wrote:
Hi All,
Ive been working michael nolls multi-node cluster setup example
(Running_Hadoop_On_Ubuntu_Linux) for hadoop and I have a working setup. I
then on my slave machine -- which is currently running a datanode killed the
process in an effort to try to simulate some sort of
Manually killing a process might create a situation where only a portion of
your data is written to disk, and other data in queue to be written is lost.
This is what has most likely caused corruption in your name node.
Start by reading about bin/hadoop namenode -fsck:
I am trying to write an InputFormat and I am having some trouble
understanding how my data is being broken up. My input is a previous
hadoop job and I have added code to my record reader to print out the
FileSplit's start and end position, as well as where the last record I
read was located. My
Hi,
I'm trying to implement NameNode failover (or at least NameNode local
data backup), but it is hard since there is no official documentation.
Pages on this subject are created, but still empty:
http://wiki.apache.org/hadoop/NameNodeFailover
http://wiki.apache.org/hadoop/SecondaryNameNode
I
I am attempting to write a map/reduce that will sort by the key and then
by the values. The output should look like:
0 0
0 1
0 5
0 123
0 89245
1 0
1 234
1 23423
My mapper is MapperLongWritable, Text, IntWritable, IntWritable and my
reducer is the identity. I configure the program using:
On Oct 28, 2008, at 6:29 AM, Malcolm Matalka wrote:
I am trying to write an InputFormat and I am having some trouble
understanding how my data is being broken up. My input is a previous
hadoop job and I have added code to my record reader to print out the
FileSplit's start and end position,
On Oct 28, 2008, at 7:53 AM, David M. Coe wrote:
My mapper is MapperLongWritable, Text, IntWritable, IntWritable
and my
reducer is the identity. I configure the program using:
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(IntWritable.class);
This is hard to diagnose without knowing your InputFormat. Each split
returned by your #getSplits() implementation is passed to your
#getRecordReader() implementation. If your RecordReader is not stopping
when you expect it to, then that's a problem in your RecordReader, no?
Have you written
We are seeing some strange lockups on a couple of our machines (in
multiple clusters)
Basically the hadoop processes will hang on the machine (datanode,
tasktracker and tasktracker$child).
And if you happen to tail the log files the tail will hang, if you do a
find in the dfs data directory
Thanks for the response Owen.
As for the 'isSplittable' thing. The FAQ calls this function
'isSplittable' but in the API it is actually 'isSplitable'. I am not
sure who to contact to fix the FAQ. I am extending FileInputFormat in
this case so it was actually returning true.
In this case the
Thanks Doug.
I have written my RecordReader from scratch. I used LineRecordReader as
a template. In my response to Owen I showed that if I set isSplitable
to false I get splits that represent my entire input file but I am only
able to read up to the 67108800 byte (which I believe is a block).
This is nice feature for sorting keys and values. Is there more documentation
somewhere that I can find? or is there a MapReduce example that uses this
feature?
Thanks,
Hien
From: Owen O'Malley [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent:
I am getting a similar exception too with Hadoop 0.18.1(See stacktrace
below), though its an EOFException. Does anyone have any idea about what
it means and how it can be fixed?
2008-10-27 16:53:07,407 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200810241922_0844_r_06_0 Merge of the
On Oct 27, 2008, at 7:05 PM, Grant Ingersoll wrote:
Hi,
Over in Mahout (lucene.a.o/mahout), we are seeing an oddity with
some of our clustering code and Hadoop 0.18.1. The thread in
context is at: http://mahout.markmail.org/message/vcyvlz2met7fnthr
The problem seems to occur when
Greetings Hadoop users,
I'm relatively new to MapReduce (I've been working on my own with the
Hadoop code for about a month and a half now), and I'm having difficulty
with how the values for a given key are passed to the reducer.
As per the API, the reducer expects a single Key and an
Hi Alex, I'm sorry, I think you misunderstood my question. Let me
explain some more.
I have a hadoop cluster of dual quad core machines.
I'm using hadoop-0.18.1 with Matei's fairscheduler patch
https://issues.apache.org/jira/browse/HADOOP-3746 running in FIFO mode.
I have about 5 different
I understand your question now, Doug; thanks for clarifying. However, I
don't think I can give you a great answer. I'll give it a shot, though:
It does seem like having a single task configuration in theory would improve
utilization, but it might also make things worse. For example, generally
Hi,
I'm interested in graph algorithms. In single machine, as far as we
know graph can be stored to linked list or matrix. Do you know about
difference benefit between linked list and matrix? So, I guess
google's web graph will be stored as a matrix in a bigTable.
Have you seen my 2D block
Tomislav.
Contrary to popular belief the secondary namenode does not provide failover,
it's only used to do what is described here :
http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode
So the term secondary does not mean a second one but is more like a
second part
Hi,
for convenience reasons, I was wondering if there is a simple way to
produce one output file per key in the Reducer?
Thanks,
Florian
I'm a little confused about the implemention of DBInputFormat. In my view ,
The method getSplits of DBInputFormat splits the resultset into serval
splits logically. so The DbRecordReader should process the DbSplit. But I
find in the real implement of DbRecordReader It process the resultset
Along these lines, I'm curious what management tools folks are using to
ensure cluster availability (ie., auto-restart failed datanodes/namenodes).
Are you using a custom cron script, or maybe something more complex
(Ganglia, Nagios, puppet, etc.)?
Thanks,
Norbert
On 10/28/08, Steve Loughran
I think using cron tab will be a good solution. Just using a test script
to ensure the living processes and restart them when they are down.
Norbert Burger 写道:
Along these lines, I'm curious what management tools folks are using to
ensure cluster availability (ie., auto-restart failed
Quick question (I haven't looked at your comparator code yet) - is this
reproducible/consistent?
On 10/28/08 11:52 PM, Deepika Khera [EMAIL PROTECTED] wrote:
I am getting a similar exception too with Hadoop 0.18.1(See stacktrace
below), though its an EOFException. Does anyone have any idea
Hi,
How are you passing your classes to the pipes job? If you are passing
them as a jar file, you can use -libjars option. From branch 0.19, the
libjar files are added to the client classpath also.
Thanks
Amareshwari
Zhengguo 'Mike' SUN wrote:
Hi,
I implemented customized classes for
Thanks Mice,
tried using that already - however this doesn't yield the desired
results - upon output collection (using the OutputCollector), it still
produces only one output file (note, I only have one input file, not
multiple input files, but want a file per key for the output...)
Did you override generateFileNameForKeyValue?
2008/10/29 Florian Leibert [EMAIL PROTECTED]:
Thanks Mice,
tried using that already - however this doesn't yield the desired results -
upon output collection (using the OutputCollector), it still produces only
one output file (note, I only have
On Tue, Oct 28, 2008 at 5:15 PM, Edward J. Yoon [EMAIL PROTECTED]wrote:
...
In single machine, as far as we
know graph can be stored to linked list or matrix.
Since the matrix is normally very sparse for large graphs, these two
approaches are pretty similar.
... So, I guess google's web
Great, thanks for that hint - for some reason I expected that behavior
to be a feature of the MultipleTextOutputFormat class - doing so
solved my problem! Thanks!!
Here my code (I wanted to specifically omit outputting the key however
still having a file per key) if anyone is interested:
29 matches
Mail list logo