>From the script I see that mahout finally runs bin/hadoop and hadoop runs the >java command. Basically I want to know more about data structures (tree, list, >vector, array, ...) or data flow.
Regards, Mahmood On Saturday, March 29, 2014 12:54 AM, Chandler Burgess <cburg...@icontrolesi.com> wrote: Mahmood, What are you trying to get at with your question? Does the answer affect something in your environment? If you have the environment variable MAHOUT_LOCAL set, trainnb still runs MapReduce jobs but it runs, basically, Hadoop in memory (from what I can tell anyways). If you don't have that variable set, then the job gets submitted to your Hadoop environment (if the Hadoop environment variables are properly configured). If you are just getting started and playing around, I would recommend setting MAHOUT_LOCAL, e.g. export MAHOUT_LOCAL=1. I'm a beginner myself but have done a lot of playing around with naïve bayes locally, using datasets up to 400k documents to test with training sets up to 30k documents, and it runs very fast. -----Original Message----- From: Andrew Musselman [mailto:andrew.mussel...@gmail.com] Sent: Friday, March 28, 2014 2:57 PM To: user@mahout.apache.org Subject: Re: Question about Mahout/Hadoop You're running a bash script that lives at $MAHOUT_HOME/bin/mahout. If you read through that script you can start to follow what goes on when you run the command starting with `mahout`. See at the bottom of the script where the `exec` commands are; that's where things start to be executed. On Fri, Mar 28, 2014 at 12:34 PM, Mahmood Naderan <nt_mahm...@yahoo.com>wrote: > Hi > I want to know then I run a command like > mahout trainnb -i .... -o ... > > , am I running a mahout code or hadoop? > In other words, which one is dominant? > > > Regards, > Mahmood