Code execution path of mahout

2016-02-03 Thread Mahmood N
Hi, This is a question about Mahout 0.6! which is pretty old and I know that. Consider this command (which I don't know if it is valid in the newer versions or not) ./bin/mahout testclassifier -m $CLASSIFICATION_MODEL -d $CLASSIFICATION_INPUT --method mapreduce I want to know which parts of th

Re: Code execution path of mahout

2016-02-03 Thread Andrew Musselman
Hi Mahmood, would be possible to trace the path out in an IDE like IntelliJ but there's no automated method to print that out, if that's what you're asking. Definitely recommend upgrading as that's five major releases old if at all possible. Best Andrew On Wed, Feb 3, 2016 at 10:35 AM, Mahmood N

Re: Code execution path of mahout

2016-02-03 Thread Mahmood N
Dear Andrew, Thanks for your reply. In fact, I need this information as part of my study on some data analytics workloads. The benchmark had been setup about 3 years ago by someone! What I really want to know is that how the software model (here the code execution path) differs from regular de

Re: Code execution path of mahout

2016-02-03 Thread Andrew Musselman
The new code still uses sparse and dense vectors and matrices, with local and distributed iterators over rows and blocking into chunks of matrices as appropriate. You would be better off checking out the newest version from source ( https://github.com/apache/mahout) and taking a look since I won't

Re: Code execution path of mahout

2016-02-03 Thread Mahmood N
>The new code still uses sparse and dense vectors and matrices, with local and >distributed >iterators over rows and blocking into chunks of matrices as >appropriate. That is a good thing to know... Regardless of the comparison, do you know where the most important data structures are defined?

Re: Code execution path of mahout

2016-02-03 Thread Andrew Musselman
Here are a bunch https://github.com/apache/mahout/tree/master/math/src/main/java/org/apache/mahout/math Large matrices are typical, often on the order of hundreds of thousands to millions of rows and hundreds of columns. On Wed, Feb 3, 2016 at 11:21 AM, Mahmood N wrote: > >The new code still us

Re: Code execution path of mahout

2016-02-03 Thread Mahmood Naderan
Really thanks for that. I am getting closer to what I was searching for... Is there any high level document about the procedure of the classifier (using map reduce) after the training phase. For example: 1- Reading chunks 2- Sorting each chunk 3-... I didn't find such an example on the web. Maybe

Re: Code execution path of mahout

2016-02-03 Thread Andrew Musselman
I'd say work your way through that class and follow along with what it does; I don't know of any documents like that beyond the code and what's on the Mahout web site at http://mahout.apache.org. On Wednesday, February 3, 2016, Mahmood Naderan wrote: > Really thanks for that. I am getting closer

Re: Mahout error : seq2sparse

2016-02-03 Thread Andrew Musselman
Is it possible you have any empty lines or extra whitespace at the end or in the middle of any of your input files? I don't know for sure but that's where I'd start looking. Are you on the most recent release? On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna wrote: > Mahout in local mode > > I am ab

Re: Mahout error : seq2sparse

2016-02-03 Thread Alok Tanna
Thank you Andrew for the quick response . I have around 300 input files. It would take a while for me to go though each file. I will try to look into that, but then I had successfully generated the sequence file use mahout seqdirectory for the same dataset. How can I find which mahout release I am

Re: Mahout error : seq2sparse

2016-02-03 Thread Alok Tanna
Thank you Andrew for the quick response . I have around 300 input files. It would take a while for me to go though each file. I will try to look into that, but then I had successfully generated the sequence file use mahout seqdirectory for the same dataset. How can I find which mahout release I am

Re: Mahout error : seq2sparse

2016-02-03 Thread Andrew Musselman
Ah; looks like that config can be set in Hadoop's core-site.xml but if you're running Mahout in local mode that shouldn't help. Can you try this with local mode off, in other words on a running Hadoop/Spark cluster? Looking for empty lines could be run via a command like `grep -r "^$" input-file-

Re: Mahout error : seq2sparse

2016-02-03 Thread Alok Tanna
This command works thank you , yes I am seeing lot of empty lines in my input files. any magic command to remove this lines that would save lot of time. I would re run this once I have removed empty lines. It would be great if I can get this working in local mode or else I will have to send few d

Re: Mahout error : seq2sparse

2016-02-03 Thread Andrew Musselman
$ for i in `ls input-directory`; do sed -i '/^$/d' input-directory/$i; done On Wed, Feb 3, 2016 at 9:08 PM, Alok Tanna wrote: > This command works thank you , yes I am seeing lot of empty lines in my > input files. any magic command to remove this lines that would save lot of > time. > I would

Re: Mahout error : seq2sparse

2016-02-03 Thread Andrew Musselman
For the Mahout version you could run `mahout` and look for lines that include the version-jar name, such as: "MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.11.1-job.jar" We don't have a -version flag that I can see but I just opened https://issues.apache.org/jira/browse/MAHOUT-1798 which you're f

Re: Mahout error : seq2sparse

2016-02-03 Thread Alok Tanna
Thank you Andrew . I was able to remove empty lines with your help and also run re run the process but then still I am getting the same error. when I just run Mahout it shows me this jar /mahout-examples-1.0-SNAPSHOT-job.jar! I think only option I have now is to set up the cluster and run it on

Re: Mahout error : seq2sparse

2016-02-03 Thread Andrew Musselman
Would recommend updating to the latest version if you can; you're probably working with two-releases-old code. On Wednesday, February 3, 2016, Alok Tanna wrote: > Thank you Andrew . I was able to remove empty lines with your help and > also run re run the process but then still I am getting the

Re: Mahout error : seq2sparse

2016-02-03 Thread Alok Tanna
Will try to update it to night to the latest version and then give it a try . Thanks, Alok Tanna On Thu, Feb 4, 2016 at 1:48 AM, Andrew Musselman wrote: > Would recommend updating to the latest version if you can; you're probably > working with two-releases-old code. > > > On Wednesday, Februar