Re: JobTracker History data+analysis

2008-07-27 Thread Amareshwari Sriramadasu
HistoryViewer is used in JobClient to view the history files in the directory provided on the command line. The command is $ bin/hadoop job -history #by default history is stored in output dir. outputDir in the constructor of HistoryViewer is the directory passed on the command-line. You can

Re: JobTracker History data+analysis

2008-07-27 Thread Paco NATHAN
Thank you, Amareshwari - That helps. Hadn't noticed HistoryViewer before. It has no JavaDoc. What is a typical usage? In other words, what would be the "outputDir" value in the context of ToolRunner, JobClient, etc. ? Paco On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari Sriramadasu <[EMAIL PRO

Re: JobTracker History data+analysis

2008-07-27 Thread Amareshwari Sriramadasu
Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if it make sense? Thanks Amareshwari Paco NATHAN wrote: We have a need to access data found in the JobTracker History link. Specifically in the "Analyse This Job" analysis. Must be run in Java, between jobs, in the same code

Help: specifying different input/output class for combiner and reducer

2008-07-27 Thread Qin Gao
Hi all, I am trying to specify different key/value classes for combiner and reducer in my task, for example, I want the mapper to output integer==>(integer,float) pair, and then the combiner outputs integer==>some structure. Finally the reducer takes in integer==>some structure and output null==>i

JobTracker History data+analysis

2008-07-27 Thread Paco NATHAN
We have a need to access data found in the JobTracker History link. Specifically in the "Analyse This Job" analysis. Must be run in Java, between jobs, in the same code which calls ToolRunner and JobClient. In essence, we need to collect descriptive statistics about task counts and times for map, s

Re: Name node heap space problem

2008-07-27 Thread Gert Pfeifer
There I have: export HADOOP_HEAPSIZE=8000 ,which should be enough (actually in this case I don't know). Running the fsck on the directory it turned out that there are 1785959 files in this dir... I have no clue how I can get the data out of there. Can I somehow calculate, how much heap a nam

Re: Bean Scripting Framework?

2008-07-27 Thread Andreas Kostyrka
On Saturday 26 July 2008 00:53:48 Joydeep Sen Sarma wrote: > Just as an aside - there is probably a general perception that streaming > is really slow (at least I had it). > > The last I did some profiling (in 0.15) - the primary overheads from > streaming came from the scripting language (python i

Re: partitioning the inputs to the mapper

2008-07-27 Thread lohit
>How do I partition the inputs to the mapper, such that a mapper >processes an entire file or files? What is happening now is that each >mapper receives only portions of a file and I want them to receive an >entire file. Is there a way to do that within the scope of the >framework? http:

partitioning the inputs to the mapper

2008-07-27 Thread Shirley Cohen
How do I partition the inputs to the mapper, such that a mapper processes an entire file or files? What is happening now is that each mapper receives only portions of a file and I want them to receive an entire file. Is there a way to do that within the scope of the framework? Thanks, Sh