from:"Bob Morris"

Printing clusters of NamedVectors in mahout 0.9

2014-07-29 Thread Bob Morris

In a thread beginning at [1], Bikash Gupta asks about what seems to be the same issue I have, namely getting cluster lists from the output of a clustering of NamedVector objects. (In my case clusters come from a CanopyDriver call.) In the thread at [1] I understand Suneel's answer of Feb 24 2014

structure of part-r-00000 and SequenceFile.Reader NullPointerException

2014-07-08 Thread Bob Morris

I'm doing Canopy clustering with CanopyDriver on a sequence file of NamedVectors and seem to get the expected set of map and reduce directories. But when I try to read the part-r- file with a SequenceFile.Reader, an attempt to iterate over the reader, I immediately get a NullPointerException

Re: simple idea for improving mahout docs over the next month?

2014-04-18 Thread Bob Morris

My first entry on such a page would be a plea for more rigor in the annotation of the java code for the utilities. For example, ClusterDumper.java has essentially no annotation and I found that I had to spend a lot of time to figure out whether (a) it had a call that would do what I wanted and if

Grumble about (lack of) warning of deprecation of Canopy KMeans

2014-04-18 Thread Bob Morris

I was taken aback that the immensely touted and convenient Canopy KMeans package was today deprecated [1] in the incubating mahout 1.0 with no hint that I could find warned in this, at least back through March. And even then I can see only in retrospect that a suggestion lurked in [2] that

text dictionary errors from ClusterDumper

2014-03-30 Thread Bob Morris

After running CanopyDriver.run on some 4 dimensional DenseVectors, I'm using a handcrafted text dictionary passed to ClusterDumper declared as dictionary type text. The dictionary looks like this, with the entry lines having dimension and feature name separated by tab: 4 0 recordedBy 1

newbie asks how to making dictionary files

2014-03-23 Thread Bob Morris

I'm a mahout novice trying to do some semantic data clustering with Canopy clustering on some low-dimensional SequenceFiles that I vectorized with ad-hoc java code. (Some features are strings vextorized by the Levenstein distance from a constant, some are DateTime objects vectorized as

Printing clusters of NamedVectors in mahout 0.9

structure of part-r-00000 and SequenceFile.Reader NullPointerException

Re: simple idea for improving mahout docs over the next month?

Grumble about (lack of) warning of deprecation of Canopy KMeans

text dictionary errors from ClusterDumper

newbie asks how to making dictionary files

6 matches

Site Navigation

Mail list logo

Footer information