Re: Modify number of mappers for a mahout process?

2013-08-01 Thread Ryan Josal
Galit, yes this does sound like this is related, and as Matt said, you can test this by setting the max split size on the CLI. I didn't personally find this to be a reliable and efficient method, so I wrote the -m parameter to my job to set it right every time. It seems that this would be usef

Re: Modify number of mappers for a mahout process?

2013-08-01 Thread Matt Molek
Oops, I'm sorry. I had one too many zeros there, should be '-Dmapred.max.split.size=10' Just (input size)/(desired number of mappers)

Re: Modify number of mappers for a mahout process?

2013-08-01 Thread Matt Molek
One trick to getting more mappers on a job when running from the command line is to pass a '-Dmapred.max.split.size=' argument. The is a size in bytes. So if you have some hypothetical 10MB input set, but you want to force ~100 mappers, use '-Dmapred.max.split.size=100' On Wed, Jul 3

Modify number of mappers for a mahout process?

2013-07-31 Thread Fuhrmann Alpert, Galit
Hi, It sounds to me like this could be related to one of the Qs I've posted several days ago (is it?): My mahout clustering processes seem to be running very slow (several good hours on just ~1M items), and I'm wondering if there's anything that needs to be changed in setting/configuration. (a