Hi,

I am trying to generate random data using hadoop streaming & python. It's a
map only job and I need to run a number of maps. There is no input to the
map as it's just going to generate random data.

How do I specify the number of maps to run? ( I am confused here because,
if I am not wrong, the number of maps spawned is related to the input data
size )
Any ideas as to how this can be done?

Warm regards,
Austin

Reply via email to