Thanks, the io.sort.mb is 200M here, I'll do more test on it.
On Fri, Aug 12, 2011 at 8:47 PM, Florin P wrote:
> Hello!
> We've encountered some slow map process when sending large amounts of data
> from the mapper process to the reducer process. The output of the map process
> will be first w
This is fairly common. Just write your key as a Java class, implement
WritableComparable, and do the right thing with your compareTo() and
hashCode()/equals() methods.
SecondarySort.IntPair in the examples may be inspirational.
On Fri, Aug 12, 2011 at 3:49 PM, Roger Chen wrote:
> Hi all,
> Is an
Hi all,
Is anybody familiar with how to define a custom composite key in a format
such as (Text, Text) so your context written as a key and value could be
((Text, Text) LongWritable)?
Thanks,
--
Roger Chen
UC Davis Genome Center
Using 0.20.2...
Is it possible to override the context.write() method in ReduceContext? I have
an entire set of Reducers that I would like to all use a specific function just
before every context.write() but I don't want them to worry about that logic,
just to have it handled transparently.
Fo
Hello!
We've encountered some slow map process when sending large amounts of data
from the mapper process to the reducer process. The output of the map process
will be first written into a buffer whose size is given by the
io.sort.mb property defined in core-site.xml. Its default value is 100.
yes, there is a reduce. In fact, I'm using hive to run map reduce
jobs, and the reducer is a perl script.
The data send to reducer is about 1/3 or 1/4 of map input data.
On Fri, Aug 12, 2011 at 5:26 PM, Florin P wrote:
> Hello!
> Di you have a reducer class involved? If yes, what is the amount
Hello!
Di you have a reducer class involved? If yes, what is the amount of data that
you are sending from the mapper to the reducer?
Regards,
Florin
--- On Fri, 8/12/11, wd wrote:
> From: wd
> Subject: some map run really slow
> To: mapreduce-user@hadoop.apache.org
> Date: Friday, August 12,
I am using hadoop 0.20.2 on CDH.
I am trying to get the filename of the file currently being
processed. I will extract some information from the filename which
will determine the data processing to be performed.
I want to do this because I need to process a lar
Hi,
Mostafa you said it, the port number has changed to 50070.
But I don't know how to read the userlog.
I did not get any information helpful.
Meta VERSION="1" .
Job JOBID="job_201108111938_0002" JOBNAME="Retrieval" USER="root"
SUBMIT_TIME="1313135011123"
JOBCONF="hdfs://cx1:9000/home/cx/softw
hi,
Here is the log for map run in one map/reduce.
map1, run 1mins, 2sec, and processed 48572 rows
2011-08-12 01:34:28,313 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: Initializing Self 10 MAP
2011-08-12 01:34:28,313 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self 0
T
10 matches
Mail list logo