Configuring Mahout Maven Project to use hadoop2

2014-11-14 Thread mw
Hi, i am working on a rest api for mahout called kornakapi. I heared that it is possible to compile the mahout trunk such that it is compatible with hadoop2. How can i do that? Is it possible to do that via the pom.xml? Best, max

Re: Configuring Mahout Maven Project to use hadoop2

2014-11-14 Thread mw
On 11/14/2014 03:13 PM, Gokhan Capan wrote: Max, Check this out: https://mahout.apache.org/developers/buildingmahout.html Gokhan On Fri, Nov 14, 2014 at 4:11 PM, mw wrote: Hi, i am working on a rest api for mahout called kornakapi. I heared that it is possible to compile the mahout trunk

Re: Configuring Mahout Maven Project to use hadoop2

2014-11-14 Thread mw
this one renamed somehow? Best, Max On 11/14/2014 04:51 PM, Gokhan Capan wrote: Hi Max, If it is installed correctly, just adding the module you require a dependency should work. ... org.apache.mahout mahout-*** 1.0-SNAPSHOT ... Best Gokhan On Fri, Nov 14, 2014 at 4:42 PM, mw wrote: Hi

Using Mahout 1.0-SNAPSHOT with yarn cluster

2015-01-07 Thread mw
Hello, i am working on a web application that should execute lda on a external yarn cluster. I am uploading all the relevant sequence files onto the yarn cluter. This is how it try to remotely execute lda on the cluster. try { ugi.doAs(new PrivilegedExceptionAction() {

Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-07 Thread mw
Hello, the first error was due to a missing property in yarn.xml. However no i have a different problem. i am working on a web application that should execute lda on a external yarn cluster. I am uploading all the relevant sequence files onto the yarn cluter. This is how it try to remotely

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-08 Thread mw
. What is the best practice? Best, Max On 01/07/2015 06:13 PM, mw wrote: Hello, the first error was due to a missing property in yarn.xml. However no i have a different problem. i am working on a web application that should execute lda on a external yarn cluster. I am uploading all the

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-09 Thread mw
/target with all dependencies packaged. This should have everything needed for lda. On Jan 8, 2015, at 5:50 AM, mw wrote: Hello again, maybe my question was misleading. I am asking whether the intended usage is to provide the job with the required library’s and sent those together with the job to

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-09 Thread mw
I looked into the submitted job.jar and i found that the missing class(org.apache.mahout.math.Vector) is not contained. On 01/09/2015 12:57 PM, mw wrote: I wrote a message to the hadoop list about it. Also i found this https://issues.apache.org/jira/browse/MAHOUT-1498 ticket. Could it be a

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-09 Thread mw
I found a solution! I had to upload the missing jars onto yarn hdfs and add the following to the hadoop Configuration: hadoopConf.set("tmpjars","/lib/mahout-math-1.0-20150108.230237-316.jar,/lib/commons-cli-2.0-mahout.jar"); Best, Max On 01/09/2015 02:13 PM, mw wrot

LDA p(Topic|Document)

2015-02-18 Thread mw
Hello, i am using lda to build a topic model over 30k articles. However i have a problem to get the p(Topic|Document) for topics that have a relative low prior. For example for one articles there are basically two relevant topics with P(10|articles)=0.09802209698050128 and p(111|articles)=0

Importing tfidf from training set

2015-03-17 Thread mw
Hello, i am running lda on a training set to create a topic model. For calculating p(topic|document) on unseen data i need to import the inverse document frequency from the training set. Is there a way to do that in mahout? Best, Max

using trainDocTopicModel to approximate p(topic|document)

2015-03-18 Thread mw
Hello, i am trying to use a topicmodel to approximate p(topic|document) like this TopicModel model = new TopicModel(hadoopConf, conf.getEta(), conf.getAlpha(), dict trainingThreads, modelWeight, models); Vector docTopics = new DenseVector(new double[model.getNumTopics()]).

Re: using trainDocTopicModel to approximate p(topic|document)

2015-03-18 Thread mw
On 03/18/2015 05:54 PM, mw wrote: Hello, i am trying to use a topicmodel to approximate p(topic|document) like this TopicModel model = new TopicModel(hadoopConf, conf.getEta(), conf.getAlpha(), dict trainingThreads, modelWeight, models); Vector docTopics = new

SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread mw
Hello, I am trying to get tfidf vectors from a corpus of 100k documents. I noticed that tfidf sequence file is empty, while the tf vectors are not. Here is the log from SparseVectorsFromSequenceFiles: INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1

Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread mw
Mahout 0.10.0 On 04/21/2015 02:05 PM, Suneel Marthi wrote: What's the Mahout Version# u r running with? On Tue, Apr 21, 2015 at 6:37 AM, mw wrote: Hello, I am trying to get tfidf vectors from a corpus of 100k documents. I noticed that tfidf sequence file is empty, while the tf vector

Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-22 Thread mw
tomcat7 155 Apr 22 11:02 part-r-0 -rw-r--r-- 1 tomcat7 tomcat7 12 Apr 22 11:02 .part-r-0.crc -rw-r--r-- 1 tomcat7 tomcat70 Apr 22 11:02 _SUCCESS -rw-r--r-- 1 tomcat7 tomcat78 Apr 22 11:02 ._SUCCESS.crc On 04/21/2015 02:14 PM, mw wrote: Mahout 0.10.0 On 04/21/2015 02:05 PM, Suneel

Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-22 Thread mw
Increasing maxDFSigma solved it. Does anybody know why that is? On 04/22/2015 11:12 AM, mw wrote: Also i noticed that there must be something wrong when calculating the variance since the file in stdcalc seems to be empty: root@test:[/opt/sparse/stdcalc] # ll total 20K drwxr-xr-x 2 tomcat7

Re: Speed up LDA in Mahit 0.9

2015-05-07 Thread mw
As far as I understood, the runtime complexity is O(N*T*D), where N is the number of words, T the number of topics and D the number of documents. So you can try e.g. to reduce the number of words. On 05/05/2015 10:36 AM, Donni Khan wrote: Hello Mahout Users, I'm runing LDA job (Mahout 0.9) b

Mahout 0.10.0 with yarn 2.6

2015-05-19 Thread mw
Hi Mauhout-User, does anbody know if mahout 0.10.0 runs on yarn 2.6? Best, Max