Hey Sean,

Just curious about your AWS comment.  I am only in very early testing phases 
with AWS EMR.  So, would you say that you generally recommend manually setting 
an EC2 cluster to run Mahout over EMR?  I guess the question is: for those of 
us without the resources to setup an in-house hadoop cluster, what is the best 
setup we can hope to acheive?


On Jun 24, 2011, at 11:26 PM, Sean Owen wrote:

> There's that. There's also the fact that a 32-way machine almost certainly
> doesn't have 32 times the I/O bandwidth, let alone 32 times faster seek
> latency. (That is, it doesn't have 32 disks.) For a lof these kinds of jobs
> you could end up with an I/O bottleneck.
> 
> Speaking of AWS and EMR, I find that I/O bottleneck is by far the issue
> there. I spread my jobs there as far across instances and racks as possible
> just to try to steal more little machine's I/O seeks!
> 
> On Sat, Jun 25, 2011 at 3:17 AM, edwin <edwintc...@gmail.com> wrote:
> 
>> Hi Ted,
>> I'm wondering for "isn't going to work well", you refer to inevitable
>> unnecessary hadoop overhead running on a single machine or there are other
>> implications to run big jobs on a single machine?
>> 
>> - edwin
>> 
>> On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote:
>> 
>>> I have done this with VM's but I would not generally recommend it.
>> Without
>>> VM's you will have a pretty ugly configuration issue because Hadoop
>> usually
>>> assumes it owns the machine.
>>> 
>>> Besides, this is a seriously square peg into a round hole kind of problem
>>> here.  Hadoop (map-reduce) was designed so that you could use several
>> little
>>> machines instead of one big one.  It just isn't going to work well on a
>>> single computer.
>>> 
>>> On Fri, Jun 24, 2011 at 6:49 PM, XiaoboGu <guxiaobo1...@gmail.com>
>> wrote:
>>> 
>>>> Do you have any experience  in running multiple data nodes and task
>>>> trackers on a single SMP server.
>>>> 
>>>>> -----Original Message-----
>>>>> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
>>>>> Sent: Saturday, June 25, 2011 9:26 AM
>>>>> To: user@mahout.apache.org
>>>>> Cc: d...@mahout.apache.org
>>>>> Subject: Re: Can all the algorithms in Mahout be run locally without a
>>>> Hadoop cluster.
>>>>> 
>>>>> Pretty big.  SHould scream for local classifier learning.
>>>>> 
>>>>> Local Hadoop should run pretty fast as well.
>>>>> 
>>>>> On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu <guxiaobo1...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> 32Core, 256G RAM
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
>>>>>>> Sent: Saturday, June 25, 2011 1:37 AM
>>>>>>> To: user@mahout.apache.org
>>>>>>> Cc: d...@mahout.apache.org
>>>>>>> Subject: Re: Can all the algorithms in Mahout be run locally without
>>>> a
>>>>>> Hadoop cluster.
>>>>>>> 
>>>>>>> Big iron is fine for some of the classifier stuff, but throughput per
>>>> $
>>>>>> can
>>>>>>> be higher for other algorithms with a cluster of smaller machines.
>>>>>>> 
>>>>>>> How big a machine are you talking about?  Even relatively small
>>>> machines
>>>>>> are
>>>>>>> pretty massive any more.  8 core = 16 hyper-thread machines with 48GB
>>>>>> seem
>>>>>>> to be not even very impressive any more.
>>>>>>> 
>>>>>>> On Fri, Jun 24, 2011 at 1:47 AM, XiaoboGu <guxiaobo1...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> We will put a big SMP server to deploy Mahout.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> 
>>>>>>>> Xiaobo Gu
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to