Hello all. 

 

In general, almost all mapper in hadoop application have the simple
structure.

Just read 1 record from input split, and then write intermediate output to
the local disk.

However, some Mappers such as CanopyMapper, Step0Mapper,
NaiveBayesThetaMapper read all records in the Mapper, 

and compute something, and finally write intermediate output to local disk
at "cleanup" stage. 

 

I'm finding cases like these now in order to improve hadoop scheduling. 

I believe there are many cases which are complex, but it is very hard for
me to find real examples which are practically used nowadays.

Could you let me know if you guys know anything about this kinds of
applications?

Even if your application or some application you know is not complete, that
is not problem, because I just would like to convince that there are such
cases in real world.

 

Thanks and regards

Jongse Park

 

Reply via email to