(1, 0, 0) and (10, 0, 0) have very large distance in R^3, but 0 when
projected onto
the a patch near the north pole of S^4, while other pairs of vectors may
have
(nearly) unchanged distances.
Am I misunderstanding what the question was?
On Thu, Jul 21, 2011 at 9:43 PM, Ted Dunning wrote:
> Embe
Embed onto a very small part of S^4
On Thu, Jul 21, 2011 at 9:14 PM, Jake Mannix wrote:
> Think about it in 3-dimensions, how can this work?
>
Wait, this is impossible, not underspecified: if you have 4 vectors, x, y of
length N, and z, w of length 1, and six pairwise distances: d_xy, d_yz,
d_xz,
d_xw, d_yw, d_zw.
You want (d_xy / d_zw), (d_xz / d_yw), and (d_xy / d_xz) to all remain
fixed after transformation? The first will stay fixed
This is underspecified. Simply adding an additional large valued coordinate
and normalizing back to the sphere will do you what you want. This works
because small regions of S^{n+1} are very close to R^n in terms of the
Euclidean metric. This is rarely that useful, however, if your interest is
c
I have vectors of different lengths and I would like to normalize them
to a unit (hyper)sphere. However, I would like the pairwise distance
ratios to be maintained. What transform does this?
The use case for this is to make a vector set that uses cosine distances.
--
Lance Norskog
goks...@gmail.
How do 'stacked' recommenders (like the Netflix winners) work?
On Wed, Jul 20, 2011 at 9:22 PM, Jamey Wood wrote:
> Great. Thanks, Ted!
>
> --Jamey
>
> On Wed, Jul 20, 2011 at 9:57 PM, Ted Dunning wrote:
>
>> Oh... you do have to be careful with this a bit because some of these side
>> factors
Doing variable selection using a chi^2 statistic like Wald's are the log
likelihood ratio is a very dangerous thing in high dimensional spaces that
are the target of the SGD framework in Mahout. The problem is that the
variable selection itself can over-fit.
To address this problem, I suggest tha
+dev
+user
r1149369 implements the previous MAHOUT-749 patch that introduces support for
multiple reducers (specified by -Dmapred.reduce.tasks=N) for improved
scalability beyond the default of 1. The heuristic sends the clusters produced
by each mapper to 1 of N reducers in a round-robin fashio
Hello,
I plan using Mahout's OnlineLogisticRegression for probability
estimation. I have extracted many parametes for my classification
situation and I want to test how each of them affects the target
variable. Can I use Mahout to check this significance (for example using
Wald's test or Logl
Also the evaluation could be done per user, and thus manually running
multiple times per each user. Or simple defining a matrix with relevant
items per each user..
On Jul 21, 2011 4:18 PM, "Marko Ciric" wrote:
> Yes, there should exist an evaluation that allows you to pass which items
> are releva
Hi Jeff,
lol, this is probably my last reply before i fall asleep (GMT+8 here).
First thing first, data file is here: http://coolsilon.com/image-tag.mvc
Q: What is the cardinality of your vector data?
about 1000+ rows (resources) * 14 000+ columns (tags)
Q: Is it sparse or dense?
sparse (assumin
Excellent, so this appears to be localized to fuzzyk. Unfortunately, the Apache
mail server strips off attachments so you'd need another mechanism (a JIRA?) to
upload your data if it is not too large. Some more questions in the interim:
- What is the cardinality of your vector data?
- Is it spar
Hi Jeff,
Q: Did you change your invocation to specify a different -c directory (e.g.
clusters-0)?
A: Yes :)
Q: Did you add the -cl argument?
A: Yes :)
$ bin/mahout fkmeans --input sensei/image-tag.arff.mvc --output sensei/clusters
--clusters sensei/clusters/clusters-0 --clustering --overwrite
Yes, there should exist an evaluation that allows you to pass which items
are relevant. On the other hand, generally speaking, I am also trying to
evaluate with having relevant items all chosen randomly. Maybe both
implementations should exist.
On 21 July 2011 15:59, Sean Owen wrote:
> You mean,
Thanks a lot Sean! I'll try this here.
Regards,
Abmar
On Thu, Jul 14, 2011 at 12:51 PM, Sean Owen wrote:
> yes that would probably be just fine for you too.
>
> On Thu, Jul 14, 2011 at 4:14 PM, Abmar Barros wrote:
>
> > Thanks for the reply Sean,
> >
> > Another doubt: Does the ReloadFromJDBCD
You mean, have the user specify all items that are considered relevant? yes
that could be useful. Do you have a patch in mind?
Your analysis is correct, and I would not call it a bug. It's a symptom of
how little information the evaluation has to work with here without ratings.
It has to pick rand
You are correct, the wiki for fkmeans did not mention the -cl argument. I've
added that just now. I think this is what Frank means in his comment but you do
*not* have to write any custom code to get the cluster dumper to do what you
want, just use the -cl argument and specify clusteredPoints as
Hi Jeffrey,
Fuzzy kmeans outputs a [Cluster ID, WeightedVectorWritable] file under
clusters/clusteredPoints and a [Cluster ID, SoftCluster] file under
clusters/clusters-*, you don't need to write code for that.
However if you want to display your clusters in an application, along
with nice labels
Hi guys,
I wonder if Mahout should have a "precision and recall" evaluator that
calculates the relevant items data set without looking to the relevance
threshold. This would be suitable for data sets with boolean preference
nature. In addition, the relevant items can be removed from the training
d
Actually, as GenericDataModel class works very well as a super class of your
desired data model. This way everything is cached into memory and boosts
performance a lot. The reloading is actually easy to be implemented with the
refresh mechanism (Taste objects implement Refreshable interface). You c
Dear Sean,
thanks a a lot, I'll update the jar
Thanks
Marco
On Thu, Jul 21, 2011 at 1:06 PM, Sean Owen wrote:
> Going waaay back to the original question -- I can't reproduce this in the
> latest code.
>
> The result ought to be "{}" since RandomAccessSparseVector will only print
> entries that
Going waaay back to the original question -- I can't reproduce this in the
latest code.
The result ought to be "{}" since RandomAccessSparseVector will only print
entries that have a value set and are not defaulted to 0.0. And it's smart
enough in this case to remove the entry you have set since i
Hi again,
Let me update on what's working and what's not working.
Works:
fkmeans clustering (10 clusters) - thanks Jeff for the --cl tip
fkmeans clustering (5 clusters)
clusterdump (5 clusters) - so points are not included in the clusterdump and I
need to write a program for it?
Not Working:
fk
Hi Jeff,
Thanks for the help :)
Oh, I didn't know there is this --cl argument (because the documentation that I
rely
on https://cwiki.apache.org/confluence/display/MAHOUT/fuzzy-k-means-commandline
don't list it). I will try again later.
I was told that the CLI fkmeans utility don't attach poi
24 matches
Mail list logo