Dear Simon,
thanks for informing us.
I am now evaluating Prediction.io for creating a reccomandation system.
However as I see the license is an Alfresco Limited one.
So I do not understand what are the limitation.
I mean if I install prediction and I do make some chanages to the
source
Affero GPL : http://en.wikipedia.org/wiki/Affero_General_Public_License
Alfresco is something else.
It does imply that if you provide someone access to a custom version of the
engine, then you must provide the sources. But is only about the engine ie
not the clients, not the configuration, not
Dear Bertrand
Yes, that was what I understood.
But for me I miss a step.
Let us take a practical example.
I use SlopeRecommender engine and I implement its policy on how to
evaluate the similarity.
In this case I made a custom version on the engine right?
The SKD is not customize. So my
I understand that part. What I'm unclear on is if there is any ranking
or ordering of the points in each cluster before they are limited. In
other words, are the points in each cluster random ordered? Or ordered
alphabetically by the document id or filename? Or ordered by some
calculation as
Hello,
I have mahout 0.9 and a single-node Hadoop 1.2.1 running on a Mac.
I am trying to create a bunch of vectors for clustering from a
collection of text documents. So I did:
$MAHOUT_HOME/bin/mahout seqdirectory --input
/Users/hadoop/fuzzyjoin-results/NOTES/progress_notes --output
Hi Natalia,
It appears you are referencing files in your local file system instead of
files in HDFS. If you want to run Mahout under Hadoop, you would then
need to access the input file stored in HDFS and ideally output could also
be stored in potential HDFS location. Here's how I would run:
I am not sure if we have direct CSV converters to do that; CSV is not that
expressive anyway. But it is not difficult to write up such converter on
your own, i suppose.
The steps you need to do is this :
(1) prepare set of data points in a form of (unique vector key, n-vector)
tuples. Vector key
Hi All,
I have a CSV file on which I've to perform dimensionality reduction. I'm
new to Mahout, on doing some search I understood that SSVD can be used for
performing dimensionality reduction. I'm not sure of the steps that have to
be executed before SSVD, please help me.
Thanks,
Vijay
I looked at the docs and the AGPL for the server is a problem for me—maybe even
a blocker. Since the SDK is useless without the server, this may be a problem
for you.
I like the SDK, idea. The alternative is logfiles to store prefs (not a bad
architecture really) and a grow your own method for
PS. dspca method, which is almost exact replica of SSVD --pca true, is
also available on Spark running on exactly same sequence file DRM (there's
no CLI though, it needs to be wrapped in a scala code) [1]. It potentially
may be a bit better performant than MR version, although it is new. If you
Thanks a lot for the detailed explanation, it was very helpful.
I will write a CSV to sequence converter, just needed some clarity on the
key/value pairs in the sequence file.
Suppose my csv file contains the below values
11,22,33,44,55
13,23,34,45,56
I assume that the sequence file would look
On Wed, Mar 19, 2014 at 12:13 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Yes. Hashing vector encoders will preserve distances when used with
multiple probes.
So if a token occurs two times in a document the first token will be mapped
to a given location and when the token is hashed the
Dear Piero,
The AGPL is to encourage people who develop on PredictionIO contributes
back to the open community, even though they are using it to offer cloud
services. We are seriously looking into the possibility of making custom
engines/algorithms separated from the main server code, so that
Thanks for the feedback. Happy to discuss how we can resolve the AGPL
limitation for your work.
There are a few Ruby gem for PredictionIO, contributed by developers as
well as supported by PredictionIO team, that you can choose from. We hope
that the UI can assist developers manage the data
With text hashing, you have an issue because of collisions. In spite of
this, you get good results and can decrease the dimension of the data
substantially using a single hashed location.
If you use more than one probe, the probability that two words will hash to
exactly the same two locations
On Wed, Mar 19, 2014 at 11:34 AM, Frank Scholten fr...@frankscholten.nlwrote:
On Wed, Mar 19, 2014 at 12:13 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
Yes. Hashing vector encoders will preserve distances when used with
multiple probes.
So if a token occurs two times in a document
AGPL is a complete show-stopper for contributions even for dependencies.
Apache software can't critically depend on GPL components of any sort.
As such, it doesn't make any sense to have components of Mahout designed to
run only on a server that is AGPL.
On Wed, Mar 19, 2014 at 11:53 AM,
If you are using a debugger like IntelliJ or Eclipse you just create a project
that uses Mahout. By default it will run any hadoop on the native local file
system with all processes on your debug machine. That is as far as I’ve needed
to go.
Andrew is talking about how to debug while running
18 matches
Mail list logo