/association-for-computing-machinery/flumejava-easy-efficient-data-parallel-pipelines-xtUvap2t1I
>
>
> https://github.com/tdunning/Plume/commit/a5a10feaa068b33b1d929c332e4614aba50dd39a
>
>
> On Thu, May 5, 2011 at 2:16 AM, Stanley Xu wrote:
>
>> Dear All,
>>
>> Our t
dataset.
Is there any approach I could try(including change part of hadoop's source
code.)?
Best wishes,
Stanley Xu
Thanks Jason, will take a look to try MultipleOutputs for mapper.
Best wishes,
Stanley Xu
On Tue, May 3, 2011 at 11:25 PM, Jason wrote:
> It is actually trivial to do using MultipleOutputs. You just need to emit
> your key-values to both MO and standard output context/collector i
But it will let us read the same data twice, which would be a waste in IO
for large data.
Thanks.
Best wishes,
Stanley Xu
On Tue, May 3, 2011 at 4:09 PM, Bai, Gang wrote:
> IMHO
reducer, it will delete the intermediate result generated by the
mapper.
Thanks.
Stanley Xu
Sorry, I thought the reason I have only 1 mapper is that I only got 1 region
in the tests data. I would get more mappers if the data distributed in
multiple regions. And I could use the HRegionPartitioner and setReducerTask
to increase reducer numbers.
Thanks.
Best wishes,
Stanley Xu
On Mon
it will take
about 300 minutes for 150 millions entries?
I found a SimpleTotalOrderPartitioner in the 0.90.0 api but it didn't exist
in 0.20.6. Is there anything I could use in 0.20.6?
Thanks.
Best wishes,
Stanley Xu
20.3.
Thanks for both of you.
Best wishes,
Stanley Xu
On Wed, Feb 16, 2011 at 9:38 PM, MONTMORY Alain <
alain.montm...@thalesgroup.com> wrote:
> Hi,
>
>
>
> I think you could use different type for mapper and combiner, they are not
> linked together but suppose :
>
>
Dear all,
I am writing a map-reduce job today. Which I hope I could use different
format for the Mapper and Combiner. I am using the Text as the format of the
Mapper and MapWritable as the format of the format.
But it looks the hadoop didn't support that yet?
I have some code like the following: