Re: deprecated command

2016-04-11 Thread Mahmood Naderan
Hi, Any feedback is appreciated. Regards, Mahmood On Sat, Apr 9, 2016 at 3:36 PM, Mahmood Naderan <mahmood...@gmail.com> wrote: > Hi, > As I run the command > > mahout trainclassifier -i traininginput -o wikipediamodel -mf 4 -ms 4 > > I get this error: > >

deprecated command

2016-04-09 Thread Mahmood Naderan
Hi, As I run the command mahout trainclassifier -i traininginput -o wikipediamodel -mf 4 -ms 4 I get this error: Running on hadoop, using /opt/new_analytic/hadoop-2.7.1/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB:

Re: Code execution path of mahout

2016-02-03 Thread Mahmood Naderan
Really thanks for that. I am getting closer to what I was searching for... Is there any high level document about the procedure of the classifier (using map reduce) after the training phase. For example: 1- Reading chunks 2- Sorting each chunk 3-... I didn't find such an example on the web.

NegativeArraySizeException in seqdirecotry

2014-04-01 Thread Mahmood Naderan
Hello, Running seqdirecotry (from Mahout 0.9) on a large input file gives an exception which is shown as below. Any idea? MAHOUT_LOCAL is set, running locally 14/04/01 12:15:17 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],

Re: Profiling with visualvm

2014-03-31 Thread Mahmood Naderan
Whoever tried mahout/hadoop profiling, please let us know   Regards, Mahmood On Sunday, March 30, 2014 2:30 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Profiled what exactly, a Hadoop job? As soon as I run /mahout testclassifier -m wikipediamodel -d wikipediainputI see

Using split without partitioning the data to train/test

2014-03-31 Thread Mahmood Naderan
Hi, In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data         mahout wikipediaDataSetCreator -i wiki-tr/chunks -o tr-input -c labels.txt and then fed the tr-input to the trainclassifier using     mahout trainclassifier -i tr-input -o wikimodel Now, in

Re: Using split without partitioning the data to train/test

2014-03-31 Thread Mahmood Naderan
Yeah you are right. I have to ignore that command   Regards, Mahmood On Monday, March 31, 2014 6:56 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Sent from my iPhone On Mar 31, 2014, at 4:20 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi, In an old Mahout, I used

Profiling with visualvm

2014-03-30 Thread Mahmood Naderan
Hi, I  profiled the Mahout command with visualvm and saw many threads. Some of them are related to the profiler and some other are communication threads. Interesting thing is that, the main thread is always in sleep state! From the thread dump (which has been attached), the owner is Mahout.

Re: Profiling with visualvm

2014-03-30 Thread Mahmood Naderan
Profiled what exactly, a Hadoop job? As soon as I run /mahout testclassifier -m wikipediamodel -d wikipediainputI see a org.apache.mahout.driver.MahoutDriver in the visualvm and then I open it.   Regards, Mahmood

Re: Question about Mahout/Hadoop

2014-03-29 Thread Mahmood Naderan
` commands are; that's where things start to be executed. On Fri, Mar 28, 2014 at 12:34 PM, Mahmood Naderan nt_mahm...@yahoo.comwrote: Hi I want to know then I run a command like     mahout trainnb -i -o ... , am I running a mahout code or hadoop? In other words, which one is dominant

Question about Mahout/Hadoop

2014-03-28 Thread Mahmood Naderan
Hi I want to know then I run a command like     mahout trainnb -i -o ... , am I running a mahout code or hadoop? In other words, which one is dominant?   Regards, Mahmood

Re: debug mode

2014-03-26 Thread Mahmood Naderan
at 2:47 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: Hello, 1- Can we debug Hadoop/Mahout with the available bin files? Or we have to do rebuild them? Probably they are compiled with optimizations 2- I cannot find a procedure on how to build Hadoop/Mahout in debug mode. If anyone knows

Re: debug mode

2014-03-26 Thread Mahmood Naderan
Excuse me, I forgot to say that should I use mvndebug for debugging purpose?   Regards, Mahmood On , Mahmood Naderan nt_mahm...@yahoo.com wrote: Let me state in this way. Using GNU Make, we use -g -ggdb to insert debug symbols in the object file. On the other hand, if we use -O3

trainclassifier/trainnb

2014-03-25 Thread Mahmood Naderan
Hi, What is the correct syntax for this old command?    mahout trainclassifier -i traininginput -o wikipediamodel -mf 4 -ms 4 It seems that trainclassifier is replaced by trainnb but this one has no -mf option.   Regards, Mahmood

Re: trainclassifier/trainnb

2014-03-25 Thread Mahmood Naderan
, 2014 at 3:17 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: Hi, What is the correct syntax for this old command?   mahout trainclassifier -i traininginput -o wikipediamodel -mf 4 -ms 4 It seems that trainclassifier is replaced by trainnb but this one has no -mf option. Regards

Multiple errors and messages

2014-03-18 Thread Mahmood Naderan
Hello When  run the following command on Mahout-0.9  and Hadoop-1.2.1, I get multiple errors and I can not figure out what is the problem? Sorry for the long post. [hadoop@solaris ~]$ mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c ~/categories.txt Running on hadoop,

debug mode

2014-03-14 Thread Mahmood Naderan
Hello, 1- Can we debug Hadoop/Mahout with the available bin files? Or we have to do rebuild them? Probably they are compiled with optimizations 2- I cannot find a procedure on how to build Hadoop/Mahout in debug mode. If anyone knows, please let me know. Since I am a gcc/make guy, maybe these

verbose output

2014-03-13 Thread Mahmood Naderan
Hi, Is there any verbosity flag for hadoop and mahout commands? I can not find such thing in the command line.   Regards, Mahmood

Re: verbose output

2014-03-13 Thread Mahmood Naderan
. On 03/13/2014 10:21 AM, Mahmood Naderan wrote: Hi, Is there any verbosity flag for hadoop and mahout commands? I can not find such thing in the command line. Regards, Mahmood

Re: Solving heap size error

2014-03-13 Thread Mahmood Naderan
On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: As I posted earlier, here is the result of a successful test 5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml) with 4GB of RAM and -Xmx128m tooks 5 minutes to complete. I didn't find a larger

bug report

2014-03-13 Thread Mahmood Naderan
Hi Where can I submit a mahout bug? I am not familiar with JIRA and I see issues and agile.   Regards, Mahmood

Re: Solving heap size error

2014-03-13 Thread Mahmood Naderan
.   Regards, Mahmood On Thursday, March 13, 2014 2:31 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Strange thing is that if I use either -Xmx128m of -Xmx16384m the process stops at the chunk #571 (571*64=36.5GB). Still I haven't figured out is this a problem with JVM or Hadoop or Mahout? I have

Re: Solving heap size error

2014-03-13 Thread Mahmood Naderan
splitter class does but if you're interested in isolating the issue you could find out what is happening data-wise and see if there is some very large grouping on a pathologically frequent key for instance. On Thu, Mar 13, 2014 at 11:31 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: I am

Re: bug report

2014-03-13 Thread Mahmood Naderan
. On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi Where can I submit a mahout bug? I am not familiar with JIRA and I see issues and agile. Regards, Mahmood

Re: Heap space

2014-03-11 Thread Mahmood Naderan
needs 2GB. If my system supports 10GB of heap, then I will feed 5 threads at one time. When the first 5 threads are done (the chunks) then I will feed the next 5 threads and so on.   Regards, Mahmood On Monday, March 10, 2014 9:42 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: UPDATE: I

Re: Heap space

2014-03-11 Thread Mahmood Naderan
Suneel, One more thing Right now it has created 500 chunks. So 32GB out of 48GB (the original size of XML file) has been processed. Is it possible to resume that?   Regards, Mahmood On Tuesday, March 11, 2014 9:47 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Suneel, Is it possible

Re: Heap space

2014-03-11 Thread Mahmood Naderan
be a question for a Hadoop list, unless I'm misunderstanding. When you say resume what do you mean? On Tue, Mar 11, 2014 at 11:46 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: Suneel, One more thing Right now it has created 500 chunks. So 32GB out of 48GB (the original size of XML file) has

Solving heap size error

2014-03-11 Thread Mahmood Naderan
Hi, Recently I have faced a heap size error when I run   $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Here is the specs 1- XML file size = 44GB 2- System memory = 54GB (on virtualbox) 3- Heap size = 51GB

Re: Solving heap size error

2014-03-11 Thread Mahmood Naderan
to running on the entire english wikipedia. On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan nt_mahm...@yahoo.comwrote: Hi, Recently I have faced a heap size error when I run   $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o

Re: Heap space

2014-03-10 Thread Mahmood Naderan
. On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi Suneel, Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say

Re: Heap space

2014-03-10 Thread Mahmood Naderan
on the entire english wikipedia. On Monday, March 10, 2014 2:59 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Thanks for the update. Thing is, when that command is running, in another terminal I run 'top' command and I see that the java process takes less 1GB of memory. As another test

Re: Heap space

2014-03-10 Thread Mahmood Naderan
UPDATE: I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5 minutes   Regards, Mahmood On Monday, March 10, 2014 7:16 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: The extracted size is about 960MB (enwiki-latest-pages-articles10.xml). With 4GB of RAM set for the OS

Heap space

2014-03-09 Thread Mahmood Naderan
Hello, I ran this command     ./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 but got this error Exception in thread main java.lang.OutOfMemoryError: Java heap space There are many web pages regarding this and the solution is

Re: Heap space

2014-03-09 Thread Mahmood Naderan
OK  I found that I have to add this property to mapred-site.xml property namemapred.child.java.opts/name value-Xmx2048m/value /property   Regards, Mahmood On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hello, I ran this command     ./bin/mahout

Re: Heap space

2014-03-09 Thread Mahmood Naderan
Excuse me, I added the -Xmx option and restarted the hadoop services using sbin/stop-all.sh sbin/start-all.sh however still I get heap size error. How can I find the correct and needed heap size?   Regards, Mahmood On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote

Re: mahout command

2014-03-08 Thread Mahmood Naderan
: wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: No success Suneel... Please see the attachment

Re: mahout command

2014-03-08 Thread Mahmood Naderan
What a fast reply... Thanks a lot Suneel,   Regards, Mahmood On Saturday, March 8, 2014 11:29 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: You can ignore the warnings. On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Oh yes... Thanks Andrew you

mahout command

2014-03-07 Thread Mahmood Naderan
Hi When I run     mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 I get this error 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter     at

Re: mahout command

2014-03-07 Thread Mahmood Naderan
to invoke via: mahout wikipediaXmlSplitter -dpath -opath -c64 please give that a try. On Friday, March 7, 2014 8:11 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi When I run     mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 I get

Re: mahout command

2014-03-07 Thread Mahmood Naderan
FYI, I am trying to complete the wikipedia example from Apache's document https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example   Regards, Mahmood On Friday, March 7, 2014 5:23 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: In fact,  see this file     src/conf

Re: mahout command

2014-03-07 Thread Mahmood Naderan
for Hadoop 2.x. On Friday, March 7, 2014 11:16 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: FYI, I am trying to complete the wikipedia example from Apache's document https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example   Regards, Mahmood On Friday, March 7, 2014 5:23

Installation question

2014-02-23 Thread Mahmood Naderan
Hi, I have followed the steps stated in https://cwiki.apache.org/confluence/display/MAHOUT/BuildingMahout to install Mahout. However I get an error at mvn install hadoop@solaris:~/mahout-distribution-0.9$ mvn install [INFO] Scanning for projects... [INFO]

Re: Installation question

2014-02-23 Thread Mahmood Naderan
Lequeux Le Dimanche 23 février 2014 20h08, Mahmood Naderan nt_mahm...@yahoo.com a écrit : Hi, I have followed the steps stated in https://cwiki.apache.org/confluence/display/MAHOUT/BuildingMahout to install Mahout. However I get an error at mvn install hadoop@solaris:~/mahout-distribution-0.9

Re: Installation question

2014-02-23 Thread Mahmood Naderan
you have contains all the jars needed to run jobs but not the source. On Sun, Feb 23, 2014 at 9:47 PM, Mahmood Naderan nt_mahm...@yahoo.comwrote: I have downloaded mahout-distribution-0.9.tar.gz and here is the content. hadoop@solaris:~/mahout-distribution-0.9$ ls bin