Re: Question about Mahout/Hadoop

Mahmood Naderan Fri, 28 Mar 2014 23:16:24 -0700

>From the script I see that mahout finally runs bin/hadoop and hadoop runs the 
>java command. Basically I want to know more about data structures (tree, list, 
>vector, array, ...) or data flow.

Regards,
Mahmood

On Saturday, March 29, 2014 12:54 AM, Chandler Burgess 
<cburg...@icontrolesi.com> wrote:

Mahmood,

What are you trying to get at with your question? Does the answer affect 
something in your environment? 

If you have the environment variable MAHOUT_LOCAL set, trainnb still runs 
MapReduce jobs but it runs, basically, Hadoop in memory (from what I can tell 
anyways). If you don't have that variable set, then the job gets submitted to 
your Hadoop environment (if the Hadoop environment variables are properly 
configured).

If you are just getting started and playing around, I would recommend setting 
MAHOUT_LOCAL, e.g. export MAHOUT_LOCAL=1. I'm a beginner myself but have done a 
lot of playing around with naïve bayes locally, using datasets up to 400k 
documents to test with training sets up to 30k documents, and it runs very fast.

-----Original Message-----
From: Andrew Musselman [mailto:andrew.mussel...@gmail.com] 
Sent: Friday, March 28, 2014 2:57 PM
To: user@mahout.apache.org
Subject: Re: Question about Mahout/Hadoop

You're running a bash script that lives at $MAHOUT_HOME/bin/mahout.

If you read through that script you can start to follow what goes on when you 
run the command starting with `mahout`.  See at the bottom of the script where 
the `exec` commands are; that's where things start to be executed.

On Fri, Mar 28, 2014 at 12:34 PM, Mahmood Naderan <nt_mahm...@yahoo.com>wrote:

> Hi
> I want to know then I run a command like
>     mahout trainnb -i .... -o ...
>
> , am I running a mahout code or hadoop?
> In other words, which one is dominant?
>
>
> Regards,
> Mahmood

Re: Question about Mahout/Hadoop

Reply via email to