If you don't use hashed encoding you lose the single pass nature of the
example. Also many real applications require huge vocabularies which make non
hashed representations infeasible due to memory use in the logistic regression
models.
Sent from my iPhone
On Feb 12, 2012, at 20:53, Lance No
Ah! Ok. The SGD examples in examples/bin/asf-examples.sh and
examples/bin/classify-twentynewsgroups.sh both use hash vectorization.
Should they use the sparse term vectors instead? The "new" Bayes
examples (nbtrain and nbtest) in asf-examples.sh use sparse.
On Sun, Feb 12, 2012 at 7:00 AM, Ted Dun
Hi everyone,
I'd like to run the Decision Forest classifier on the 20 newsgroups dataset.
According to the documentation, the Mahout implementation accepts only
numerical or categorical attributes, so, the only way to do it is
transforming the documents in fixed lenght vectors (maybe using tf-idf a
Hash coded vectorization *is* a random projection. It is just one that
preserves some degree of sparsity. It definitely loses information when
you use it to decrease dimension of the input. It does not "add bogus
information".
SGD doesn't like dense vectors, actually. In fact, one of the nice
On 7 February 2012 14:04, Jeff Eastman wrote:
> +1 Congratulations to Shannon for a job well done. We now have a 0.6 release
> and can begin to concentrate on the plan and issues for a 0.7 release.
Yes, congrats to all concerned, really great seeing this moving along :)
Meanwhile, the homepage s
We have a couple JIRAs that relate here: We want to factor all the (-cl)
classification steps out of all of the driver classes (MAHOUT-930) and
into a separate job to remove duplicated code; MAHOUT-931 is to add a
pluggable outlier removal capability to this job; and MAHOUT-933 is
aimed at fact
+ users@
These are great ideas, and are just the kinds of high level
conversations I was hoping to engender. From my agile background, I'd
hope to define 0.7 by a small number of "epic stories", in a subset of
our overall capabilities, which could focus our attention to a set of
derivative JI
+user@
I'd like our users involved in this discussion too.
Original Message
Subject:Re: Goals for Mahout 0.7
Date: Sat, 11 Feb 2012 22:29:02 +0100
From: Frank Scholten
Reply-To: d...@mahout.apache.org
To: d...@mahout.apache.org
I'd like to add solving