Re: Is there any way I could use to reduce the cost of Mapper and Reducer setup and cleanup in a iterative MapReduce chain?

2011-05-05 Thread Stanley Xu
/association-for-computing-machinery/flumejava-easy-efficient-data-parallel-pipelines-xtUvap2t1I > > > https://github.com/tdunning/Plume/commit/a5a10feaa068b33b1d929c332e4614aba50dd39a > > > On Thu, May 5, 2011 at 2:16 AM, Stanley Xu wrote: > >> Dear All, >> >> Our t

Is there any way I could use to reduce the cost of Mapper and Reducer setup and cleanup in a iterative MapReduce chain?

2011-05-05 Thread Stanley Xu
dataset. Is there any approach I could try(including change part of hadoop's source code.)? Best wishes, Stanley Xu

Re: Is there any way I could keep both the Mapper and Reducer output in hdfs?

2011-05-04 Thread Stanley Xu
Thanks Jason, will take a look to try MultipleOutputs for mapper. Best wishes, Stanley Xu On Tue, May 3, 2011 at 11:25 PM, Jason wrote: > It is actually trivial to do using MultipleOutputs. You just need to emit > your key-values to both MO and standard output context/collector i

Re: Is there any way I could keep both the Mapper and Reducer output in hdfs?

2011-05-03 Thread Stanley Xu
But it will let us read the same data twice, which would be a waste in IO for large data. Thanks. Best wishes, Stanley Xu On Tue, May 3, 2011 at 4:09 PM, Bai, Gang wrote: > IMHO

Is there any way I could keep both the Mapper and Reducer output in hdfs?

2011-05-02 Thread Stanley Xu
reducer, it will delete the intermediate result generated by the mapper. Thanks. Stanley Xu

Re: How to have multiple mapper and reducer for a MapReduce job on a hbase table with hbase 0.20.6?

2011-02-27 Thread Stanley Xu
Sorry, I thought the reason I have only 1 mapper is that I only got 1 region in the tests data. I would get more mappers if the data distributed in multiple regions. And I could use the HRegionPartitioner and setReducerTask to increase reducer numbers. Thanks. Best wishes, Stanley Xu On Mon

How to have multiple mapper and reducer for a MapReduce job on a hbase table with hbase 0.20.6?

2011-02-27 Thread Stanley Xu
it will take about 300 minutes for 150 millions entries? I found a SimpleTotalOrderPartitioner in the 0.90.0 api but it didn't exist in 0.20.6. Is there anything I could use in 0.20.6? Thanks. Best wishes, Stanley Xu

Re: Could we use different output Format for the Mapper and Combiner?

2011-02-16 Thread Stanley Xu
20.3. Thanks for both of you. Best wishes, Stanley Xu On Wed, Feb 16, 2011 at 9:38 PM, MONTMORY Alain < alain.montm...@thalesgroup.com> wrote: > Hi, > > > > I think you could use different type for mapper and combiner, they are not > linked together but suppose : > >

Could we use different output Format for the Mapper and Combiner?

2011-02-16 Thread Stanley Xu
Dear all, I am writing a map-reduce job today. Which I hope I could use different format for the Mapper and Combiner. I am using the Text as the format of the Mapper and MapWritable as the format of the format. But it looks the hadoop didn't support that yet? I have some code like the following: