date:20120314

SequenceFile split question

2012-03-14 Thread Mohit Anchlia

I have a client program that creates sequencefile, which essentially merges small files into a big file. I was wondering how is sequence file splitting the data accross nodes. When I start the sequence file is empty. Does it get split when it reaches the dfs.block size? If so then does it mean that

dynamic mapper?

2012-03-14 Thread robert

Suppose I want to generate a mapper class at run time and use that class in my MapReduce job. What is the best way to do this? Would I just have an extra scripted step to pre-compile it and distribute with -libjars, or if I felt like compiling it dynamically with for example JavaCompiler is there

Re: does hadoop always respect setNumReduceTasks?

2012-03-14 Thread Jane Wayne

Thanks Lance. On Thu, Mar 8, 2012 at 9:38 PM, Lance Norskog wrote: > Instead of String.hashCode() you can use the MD5 hashcode generator. > This has not "in the wild" created a duplicate. (It has been hacked, > but that's not relevant here.) > > http://snippets.dzone.com/posts/show/3686 > > I th

Re: Partition classes, how to pass in background information

2012-03-14 Thread Jane Wayne

Thanks Chris! That worked! On Wed, Mar 14, 2012 at 6:06 AM, Chris White wrote: > If your class implements the configurable interface, hadoop will call the > setConf method after creating the instance. Look in the source code for > ReflectionUtils.newInstance for more info > On Mar 14, 2012 2:31 A

Re: Using a combiner

2012-03-14 Thread Prashant Kommireddi

It is a function of the "number of spills" on map side and I believe the default is 3. So for every 3 times data is spilled the combiner is run. This number is configurable. Sent from my iPhone On Mar 14, 2012, at 3:26 PM, Gayatri Rao wrote: > Hi all, > > I have a quick query on using a combine

Capacity Scheduler APIs

2012-03-14 Thread hdev ml

Hi all, are there any capacity scheduler apis that I can use? e.g. adding, removing queues, tuning properties on the fly and so on. Any help is appreciated. Thanks Harshad

Re: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Joey Echeverria

Yes you have to deal with the compression. Usually, you'll load the compression codec in your RecordReader. You can see an example of how TextInputFormat's LineRecordReader does it: https://github.com/apache/hadoop-common/blob/release-1.0.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/LineReco

Accessing Job information via the Web UI in Hadoop 2.0

2012-03-14 Thread stevens35

HI All, I'm a little baffled by the new web UI in Hadoop 2.0. In particular, I don't see an obvious way of viewing the task specific counters or the task status's that my jobs are setting. Previously, this was all presented cleanly in the TaskTracker UI but now the ApplicationManager/Histor

RE: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Tony Burton

Hi - sorry to bump this, but I'm having trouble resolving this. Essentially the question is: If I create my own InputFormat by subclassing TextInputFormat, does the subclass have to handle its own streaming of compressed data? If so, can anyone point me at an example where this is done? Thanks

Re: questions regarding hadoop version 1.0

2012-03-14 Thread Joey Echeverria

JobTracker and TaskTracker. YARN is only in 0.23 and later releases. 1.0.x is from the 0.20x line of releases. -Joey On Mar 14, 2012, at 7:00, arindam choudhury wrote: > Hi, > > Hadoop 1.0.1 uses hadoop YARN or the tasktracker, jobtracker model? > > Regards, > Arindam

Re: Partition classes, how to pass in background information

2012-03-14 Thread Chris White

If your class implements the configurable interface, hadoop will call the setConf method after creating the instance. Look in the source code for ReflectionUtils.newInstance for more info On Mar 14, 2012 2:31 AM, "Jane Wayne" wrote: > i am using the new org.apache.hadoop.mapreduce.Partitioner cla

Hbase & python recommended interface

2012-03-14 Thread Håvard Wahl Kongsgård

Hi, anyone with recommendations for a python interface to hbase? Thrift is one possibility, but is there a library like https://github.com/pycassa/pycassa ? -- Håvard Wahl Kongsgård NTNU http://havard.security-review.net/

SequenceFile split question

dynamic mapper?

Re: does hadoop always respect setNumReduceTasks?

Re: Partition classes, how to pass in background information

Re: Using a combiner

Capacity Scheduler APIs

Re: decompressing bzip2 data with a custom InputFormat

Accessing Job information via the Web UI in Hadoop 2.0

RE: decompressing bzip2 data with a custom InputFormat

Re: questions regarding hadoop version 1.0

Re: Partition classes, how to pass in background information

Hbase & python recommended interface

12 matches

Site Navigation

Mail list logo

Footer information