snowloong wrote:
Hi,
I want to share some data structures for the map tasks on a same node(not
through files), I mean, if one map task has already initialized some data
structures (e.g. an array or a list), can other map tasks share these memorys
and directly access them, for I don't want
conf.get(map.input.file) should work. If not, then it is a bug in new
mapreduce api in 0.20
- Sharad
Binary support has been added for 0.21. One option is to wait for 0.21 to get
released, or you might try applying the patch from HADOOP-1722.
- Sharad
Jianmin Woo wrote:
Do you have some sample on the re-usage of static variables?
You can define static variables in your Mapper/Reducer class. Static variables
will survive till the jvm is live. So multiple tasks of same job running in a
single jvm would able to share those.
- Sharad
setOutputKeyClass let you specify the output key type.
For outputting selective keys, you need to call OutputCollector#collect
for only those keys. If using new map reduce API, need to call
Context#write.
- Sharad
Asim wrote:
Hi,
I wish to output only selective records to the output files
Just one more question, does Hadoop handles reassign of task failure
to different machines in some way?
Yes. If task fails then it is retried, preferably on a different machine.
I saw that sometimes, usually at the end, when there are more
processing units available than map() tasks to
see core-user mail thread with subject HBase, Hive, Pig and other Hadoop based
technologies
- Sharad
Ricky Ho wrote:
Are they competing technologies of providing a higher level language for
Map/Reduce programming ?
Or are they complementary ?
Any comparison between them ?
Rgds,
The split doesn't need to be at the record boundary. If a mapper gets
a partial record, it will seek to another split to get the full record.
- Sharad
see
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html
- Sharad
Dan Milstein wrote:
If I've got a sequence of streaming jobs, each of which depends on the
output of the previous one, is there a good way to launch that
sequence? Meaning, I
Marshall Schor wrote:
public class Super implements WritableComparableSuper {
. . .
public int compareTo(Super o) {
// sort on string value
. . .
}
I implemented the 2nd key class (let's call it Sub)
public class Sub extends Super {
. . .
public int compareTo(Sub o) {
See MultipleOutputFormat.You may require to implement your custom OutputFormat.
- Sharad
But if conf.set(...) is called after instantiating job, it doesn't.
Is this intended?
yes, Configuration must be set up before instantiating the Job object. However,
some job parameters can be changed (before the actual job submission) by
calling set methods on Job object.
- Sharad
Adam Retter wrote:
So I don't have to use HDFS at all when using Hadoop?
The input URI list has to be stored in HDFS. Each mapper will work on
a sublist of URIs depending on the no of maps set in job.
- Sharad
Each document processing is independent and can be processed
parallelly, so that part could be done in a map reduce job.
Now whether it suits this use case depends on rate at which new
URI's are discovered for processing and acceptable delay in processing
of a document. The way I see it you can
I noticed that the bin/hadoop jar command doesn't add the jar being
executed to the classpath. Is this deliberate and what is the reasoning? The
result is that resources in the jar are not accessible from the system class
loader. Rather they are only available from the thread context class
The last map task is forrever in the pending queue - is this is issue my
setup/config or do others have the problem?
Do you mean the left over maps are not at all scheduled ? What do you see in
jobtracker logs ?
Pankil Doshi wrote:
Hey
Did u find any class or way out for storing results of Job1 map/reduce in
memory and using that as an input to job2 map/Reduce?I am facing a
situation
where I need to do similar thing.If anyone can help me out..
Normally you would write the job output to a file and
warning: [unchecked] unchecked call to collect(K,V) as a member of the
raw type org.apache.hadoop.mapred.OutputCollector
Yes, I can live with this warning, but it really makes me uneasy. Any
suggestions to remove this warning?
You can suppress the warning using annotation in your code:
MultipleInputs.addInputPath(JobConf conf, Path path, Class? extends
InputFormat inputFormatClass, Class? extends Mapper mapperClass)
to add the mappers and my I/P format.
Right, and then you can use DelegatingInputFormat and DelegatingMapper.
And use MultipleOutputs class to configure the
Also available on jobtracker web ui.
Farhan Husain wrote:
The logs might help.
On Tue, Apr 7, 2009 at 7:28 PM, Mithila Nagendra mnage...@asu.edu wrote:
Hey all!
Is there a way to print out the execution time of a map reduce task? An
inbuilt function or option to be used with bin/hadoop
Suppose a batch of inputsplits arrive in the beginning to every map, and
reduce gives the word, frequency for this batch of inputsplits.
Now after this another batch of inputsplits arrive and the results from
subsequent reduce are aggregated to the previous results(if the word that
has
I have confusion how would I start the next job after finishing the one,
could you just make it clear by some rough example.
See JobControl class to chain the jobs. You can specify dependencies as well.
You can checkout the TestJobControl class for example code.
Also do I need to use
Just glanced into it. Haven't read in detail. One correction about
Secondary namenode - It is not a Hot Standby.
see http://wiki.apache.org/hadoop/FAQ#7
Ricky Ho wrote:
I put together an article describing the internal architecture of Hadoop (HDFS,
MapRed). I'd love to get some feedback if
Is there an easy way to get Reduce Output Bytes?
Reduce Output bytes not available directly but perhaps can be inferred
from File system Read/Write bytes counters.
Wasim Bari wrote:
Hello,
Does anyone know when Hadoop team has plan to Implement
FileSystem.append(Path) functionality and Something seekable with
FSDataOutputStream (mean seek capability) ?
FileSystem.append(Path) is already implemented and slated to be released
in 0.19
see
Sunil Jagadish wrote:
Hi,
I have a mapper which needs to write output into two different kinds of
files (output.collect()).
check MultipleOutputFormat. That may help.
Goel, Ankur wrote:
Hi Folks,
I am looking for some advice on some the ways / techniques
that people are using to get around namenode failures (Both disk and
host).
We have a small cluster with several job scheduled for periodic
execution on the same host where name server runs.
But when I run , It will throw the exception in DbRecordReader.next()
method, Although I have Logged in it, I can't still see anything, and don't
know where I shoud to check, who can help me where I can get the real
excution status, so I can where the error is ! Thansks!
Check the logs
28 matches
Mail list logo