SequenceFile split question

2012-03-14 Thread Mohit Anchlia
I have a client program that creates sequencefile, which essentially merges small files into a big file. I was wondering how is sequence file splitting the data accross nodes. When I start the sequence file is empty. Does it get split when it reaches the dfs.block size? If so then does it mean that

dynamic mapper?

2012-03-14 Thread robert
Suppose I want to generate a mapper class at run time and use that class in my MapReduce job. What is the best way to do this? Would I just have an extra scripted step to pre-compile it and distribute with -libjars, or if I felt like compiling it dynamically with for example JavaCompiler is there

Re: does hadoop always respect setNumReduceTasks?

2012-03-14 Thread Jane Wayne
Thanks Lance. On Thu, Mar 8, 2012 at 9:38 PM, Lance Norskog wrote: > Instead of String.hashCode() you can use the MD5 hashcode generator. > This has not "in the wild" created a duplicate. (It has been hacked, > but that's not relevant here.) > > http://snippets.dzone.com/posts/show/3686 > > I th

Re: Partition classes, how to pass in background information

2012-03-14 Thread Jane Wayne
Thanks Chris! That worked! On Wed, Mar 14, 2012 at 6:06 AM, Chris White wrote: > If your class implements the configurable interface, hadoop will call the > setConf method after creating the instance. Look in the source code for > ReflectionUtils.newInstance for more info > On Mar 14, 2012 2:31 A

Re: Using a combiner

2012-03-14 Thread Prashant Kommireddi
It is a function of the "number of spills" on map side and I believe the default is 3. So for every 3 times data is spilled the combiner is run. This number is configurable. Sent from my iPhone On Mar 14, 2012, at 3:26 PM, Gayatri Rao wrote: > Hi all, > > I have a quick query on using a combine

Capacity Scheduler APIs

2012-03-14 Thread hdev ml
Hi all, are there any capacity scheduler apis that I can use? e.g. adding, removing queues, tuning properties on the fly and so on. Any help is appreciated. Thanks Harshad

Re: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Joey Echeverria
Yes you have to deal with the compression. Usually, you'll load the compression codec in your RecordReader. You can see an example of how TextInputFormat's LineRecordReader does it: https://github.com/apache/hadoop-common/blob/release-1.0.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/LineReco

Accessing Job information via the Web UI in Hadoop 2.0

2012-03-14 Thread stevens35
HI All, I'm a little baffled by the new web UI in Hadoop 2.0. In particular, I don't see an obvious way of viewing the task specific counters or the task status's that my jobs are setting. Previously, this was all presented cleanly in the TaskTracker UI but now the ApplicationManager/Histor

RE: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Tony Burton
Hi - sorry to bump this, but I'm having trouble resolving this. Essentially the question is: If I create my own InputFormat by subclassing TextInputFormat, does the subclass have to handle its own streaming of compressed data? If so, can anyone point me at an example where this is done? Thanks

Re: questions regarding hadoop version 1.0

2012-03-14 Thread Joey Echeverria
JobTracker and TaskTracker. YARN is only in 0.23 and later releases. 1.0.x is from the 0.20x line of releases. -Joey On Mar 14, 2012, at 7:00, arindam choudhury wrote: > Hi, > > Hadoop 1.0.1 uses hadoop YARN or the tasktracker, jobtracker model? > > Regards, > Arindam

Re: Partition classes, how to pass in background information

2012-03-14 Thread Chris White
If your class implements the configurable interface, hadoop will call the setConf method after creating the instance. Look in the source code for ReflectionUtils.newInstance for more info On Mar 14, 2012 2:31 AM, "Jane Wayne" wrote: > i am using the new org.apache.hadoop.mapreduce.Partitioner cla

Hbase & python recommended interface

2012-03-14 Thread Håvard Wahl Kongsgård
Hi, anyone with recommendations for a python interface to hbase? Thrift is one possibility, but is there a library like https://github.com/pycassa/pycassa ? -- Håvard Wahl Kongsgård NTNU http://havard.security-review.net/