I have a client program that creates sequencefile, which essentially merges
small files into a big file. I was wondering how is sequence file splitting
the data accross nodes. When I start the sequence file is empty. Does it
get split when it reaches the dfs.block size? If so then does it mean that
Suppose I want to generate a mapper class at run time and use that
class in my MapReduce job.
What is the best way to do this? Would I just have an extra scripted
step to pre-compile it and distribute with -libjars, or if I felt like
compiling it dynamically with for example JavaCompiler is there
Thanks Lance.
On Thu, Mar 8, 2012 at 9:38 PM, Lance Norskog wrote:
> Instead of String.hashCode() you can use the MD5 hashcode generator.
> This has not "in the wild" created a duplicate. (It has been hacked,
> but that's not relevant here.)
>
> http://snippets.dzone.com/posts/show/3686
>
> I th
Thanks Chris! That worked!
On Wed, Mar 14, 2012 at 6:06 AM, Chris White wrote:
> If your class implements the configurable interface, hadoop will call the
> setConf method after creating the instance. Look in the source code for
> ReflectionUtils.newInstance for more info
> On Mar 14, 2012 2:31 A
It is a function of the "number of spills" on map side and I believe
the default is 3. So for every 3 times data is spilled the combiner is
run. This number is configurable.
Sent from my iPhone
On Mar 14, 2012, at 3:26 PM, Gayatri Rao wrote:
> Hi all,
>
> I have a quick query on using a combine
Hi all,
are there any capacity scheduler apis that I can use?
e.g. adding, removing queues, tuning properties on the fly and so on.
Any help is appreciated.
Thanks
Harshad
Yes you have to deal with the compression. Usually, you'll load the
compression codec in your RecordReader. You can see an example of how
TextInputFormat's LineRecordReader does it:
https://github.com/apache/hadoop-common/blob/release-1.0.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/LineReco
HI All,
I'm a little baffled by the new web UI in Hadoop 2.0. In particular, I
don't see an obvious way of viewing the task specific counters or the
task status's that my jobs are setting. Previously, this was all
presented cleanly in the TaskTracker UI but now the
ApplicationManager/Histor
Hi - sorry to bump this, but I'm having trouble resolving this.
Essentially the question is: If I create my own InputFormat by subclassing
TextInputFormat, does the subclass have to handle its own streaming of
compressed data? If so, can anyone point me at an example where this is done?
Thanks
JobTracker and TaskTracker. YARN is only in 0.23 and later releases. 1.0.x is
from the 0.20x line of releases.
-Joey
On Mar 14, 2012, at 7:00, arindam choudhury wrote:
> Hi,
>
> Hadoop 1.0.1 uses hadoop YARN or the tasktracker, jobtracker model?
>
> Regards,
> Arindam
If your class implements the configurable interface, hadoop will call the
setConf method after creating the instance. Look in the source code for
ReflectionUtils.newInstance for more info
On Mar 14, 2012 2:31 AM, "Jane Wayne" wrote:
> i am using the new org.apache.hadoop.mapreduce.Partitioner cla
Hi, anyone with recommendations for a python interface to hbase?
Thrift is one possibility, but is there a library like
https://github.com/pycassa/pycassa ?
--
Håvard Wahl Kongsgård
NTNU
http://havard.security-review.net/
12 matches
Mail list logo