I think we discussed several of these points on the mailing list.
I am not sure I would ever expect there to be a common format across
all jobs. They just don't all operate on the same information. Even
where two jobs ingest vectors, it doesn't mean vectors for one are
meaningful for another.
If
Hi,
I have a test data that has a number of points, written to a sequence file
using a Clojure script as follows (I am equally just as bad in both JAVA and
Clojure, since I really don't like JAVA I wrote my scripts in Clojure whenever
possible).
#!./bin/clj
(ns sensei.sequence.core)
Hi Jeffery!
I have encountered this problem as well. The workaround, is to run one
iteration of k-means, to create initial cluster assignment and
then run fuzzy k-means using the output from the first iteration of k-means.
Hope this helps,
Danny Bickson
On Mon, Sep 12, 2011 at 10:15 AM, Jeffrey
Hi Danny,
I have read a small portion of the source code, for variation 1, an initial
cluster will be generated using RandomSeedGenerator if there is none found in
the path so I don't have to do the initial cluster myself. For variation 2, I
actually have generated the initial cluster using
Hi all,
This is an announcement of the community site SearchWorkings.org [1]
SearchWorkings.org offers search professionals a point of contact or
comprehensive resource to learn and discuss all the
new developments in the world of open source search and related
subjects like Mahout and Hadoop.
Hi all,
My classification problem is very similar to the 20 newsgroups
example. But I don't have the possibility to use a large quantity of
data for training.
I'd like to know what would be the minimum size of training data for
SGD or SVM algorithms to have reasonable results.
My datas
Hard to say and certainly not without substantial amounts of testing.
The guy who did it seems pretty solid, but it never has been tested by
anybody for production use.
On Mon, Sep 12, 2011 at 12:54 AM, Loic Descotte loic.desco...@kelkoo.comwrote:
Mahout in Action is saying that SVM has been
I haven't played with the one in Mahout. From what I understand they
wrapped either Liblinear or Libsvm, so you should get comprobable results
from that implementation as using Libsvm from the command line or embedded
in Rapidminer or Weka.
On Mon, Sep 12, 2011 at 9:17 AM, Ted Dunning
Mahout's GA is a utility class that allows a genetic algorithm written using
Watchmaker to distribute the fitness computation. The examples are actually
part of Mahout distribution so you can take a look at them. Please note that
a good understanding of Watchmaker is required, but it's actually a
SVM is reasonable.
SGD with hand-tuning of the learning parameters may work.
With so little training data, you will have a difficult assessing whether
your system is working.
Sometimes, you can rephrase your problem so that all of your training data
across many situations can be pooled
Hi,
I am a new user to Mahout as well as to Maven. I downloaded Mahout
through the svn repository and I am trying to install it on my Mac
running the latest Lion OS. I read the instructions at
https://cwiki.apache.org/confluence/display/MAHOUT/BuildingMahout and
followed all the steps until
They actually ported the liblinear algorithm so you should get comparable
results unless there are bugs. Early tests looked good, but those are just
that.
On Mon, Sep 12, 2011 at 2:32 PM, Zach Richardson z...@raveldata.com wrote:
I haven't played with the one in Mahout. From what I understand
I am not sure I would ever expect there to be a common format across all
jobs. They just don't all operate on the same information. Even
where two jobs ingest vectors, it doesn't mean vectors for one are
meaningful for another.
Machine learning has quite a few algorithms where data is
13 matches
Mail list logo