date:20130813

Clustering of heterogenous input data

2013-08-13 Thread Christian Sengstock

Hey, i want to cluster a set of documents using a bag-of-words approach (e.g. using K-means). However, my documents (since they are automatically generated by aggregating text snippets) show huge differences according to their document size. This means, some document vectors have 50 words with a

Install mahout 0.8 with hadoop 2.0

2013-08-13 Thread Sergey Svinarchuk

Hi all, Somebody compile and install mahout with hadoop 2.0? If yes, that what changes you make in mahout, that it have 100% passed unit tests and successful work with hadoop 2.0? Thanks

Re: Using CVB; LdaTopics confusion

2013-08-13 Thread Liz Merkhofer

Christopher - I had the same confusion with vectordump output on a hadoop cluster. The solution is that it's not trying to write a file to your hdfs: -o will go locally. So when I just named a file (it did not want to create a local directory), it wound up in the /bin I was working out of. Best,

Re: Install mahout 0.8 with hadoop 2.0

2013-08-13 Thread Ted Dunning

No. There is very small demand for Mahout on Hadoop 2.0 so far and the forward/backward incompatibility of 2.0 has made it difficult to motivate moving to 2.0. The bigtop guys built a maven profile for 0.23 some time ago. I don't know the status of that. I don't think that the differences are

Re: Install mahout 0.8 with hadoop 2.0

2013-08-13 Thread Sean Owen

I think it all minimally works on Hadoop 2.0.x, though I haven't tried it recently -- it does require a recompile. This is different from it working on MRv2 versus MRv1. I'm almost certain it does not work on MRv2 and doubt it will. The effort is not large, but it's subtle. A few hacks may fail

Re: Setting up a recommender

2013-08-13 Thread Pat Ferrel

When I started looking at this I was a bit skeptical. As a Search engine Solr may be peerless, but as yet another NoSQL db? However getting further into this I see one very large benefit. It has one feature that sets it completely apart from the typical NoSQL db. The type of queries you do

Re: Setting up a recommender

2013-08-13 Thread Pat Ferrel

I finally got some time to work on this and have a first cut at output to Solr working on the github repo. It only works on 2-action input but I'll have that cleaned up soon so it will work with one action. Solr indexing has not been tested yet and the field names and/or types may need

Re: Setting up a recommender

2013-08-13 Thread Pat Ferrel

Corrections inline On Aug 13, 2013, at 10:49 AM, Pat Ferrel pat.fer...@gmail.com wrote: I finally got some time to work on this and have a first cut at output to Solr working on the github repo. It only works on 2-action input but I'll have that cleaned up soon so it will work with one

RowSimilarityJob, sampleDown method problem

2013-08-13 Thread sam wu

Mahout 0.9 snapshot RowSimilarityJob.java , sampleDown method line 291 or 300 double rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; return either 0.0 or 1.0, not fraction. needs (double) casting BR Sam

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread Ted Dunning

Why do you think this? On Tue, Aug 13, 2013 at 11:56 AM, sam wu swu5...@gmail.com wrote: Mahout 0.9 snapshot RowSimilarityJob.java , sampleDown method line 291 or 300 double rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; return either 0.0

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread sam wu

say column a has 1000 entries, maxPref=700 rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; we get rowSampleRate =0.0 ( not 0.7) do we totally skip this column or sample column entries with .7 probalility (roughly get 700 entries) On Tue, Aug 13, 2013

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread Ted Dunning

Ouch. Sorry... your original posting made it sound like you *wanted* it to be 0.0 or 1.0. This is a bug. Can you file a JIRA? On Tue, Aug 13, 2013 at 12:04 PM, sam wu swu5...@gmail.com wrote: say column a has 1000 entries, maxPref=700 rowSampleRate = Math.min(maxObservationsPerRow,

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread sam wu

Sorry for the phrasing. I'll file a JIRA Sam On Tue, Aug 13, 2013 at 12:10 PM, Ted Dunning ted.dunn...@gmail.com wrote: Ouch. Sorry... your original posting made it sound like you *wanted* it to be 0.0 or 1.0. This is a bug. Can you file a JIRA? On Tue, Aug 13, 2013 at 12:04 PM,

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread Stevo Slavić

Findbugs was reporting it whole time (see Warnings tab on https://builds.apache.org/job/Mahout-Quality/2194/findbugsResult/ and ICAST_IDIV_CAST_TO_DOUBLE bug). We should get findbugs to 0. On Tue, Aug 13, 2013 at 9:13 PM, sam wu swu5...@gmail.com wrote: Sorry for the phrasing. I'll file a

Re: Setting up a recommender

2013-08-13 Thread Pat Ferrel

OK single action recs are working so output to Solr with only [B'B] and B. On Aug 13, 2013, at 10:52 AM, Pat Ferrel pat.fer...@gmail.com wrote: Corrections inline On Aug 13, 2013, at 10:49 AM, Pat Ferrel pat.fer...@gmail.com wrote: I finally got some time to work on this and have a first

Re: Install mahout 0.8 with hadoop 2.0

2013-08-13 Thread Carlos Mundi

I recently asked the same core question on this list. I certainly won't argue with the statistics of small numbers. But I will hazard a prediction: the impetus for Mahout to support Hadoop 2 will appear about the same time the elephant book gets updated for 2.0, provided Twister or something

Clustering of heterogenous input data

Install mahout 0.8 with hadoop 2.0

Re: Using CVB; LdaTopics confusion

Re: Install mahout 0.8 with hadoop 2.0

Re: Install mahout 0.8 with hadoop 2.0

Re: Setting up a recommender

Re: Setting up a recommender

Re: Setting up a recommender

RowSimilarityJob, sampleDown method problem

Re: RowSimilarityJob, sampleDown method problem

Re: RowSimilarityJob, sampleDown method problem

Re: RowSimilarityJob, sampleDown method problem

Re: RowSimilarityJob, sampleDown method problem

Re: RowSimilarityJob, sampleDown method problem

Re: Setting up a recommender

Re: Install mahout 0.8 with hadoop 2.0

16 matches

Site Navigation

Mail list logo

Footer information