On Sun, Jul 24, 2011 at 7:52 AM, Dhruv dhru...@gmail.com wrote:
... If you look into the *definition* of HMM, the hidden sequence is drawn
from
only one set. The hidden sequence's transitions can be expressed as a joint
probability p(s0, s1). Similarly the observed sequence has a joint
, emittedState) method to
compute the output probability for a particular hidden state. I believe
this
is not what the user wanted?
Dhruv
On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
On Sun, Jul 24, 2011 at 7:52 AM, Dhruv dhru...@gmail.com wrote
is not that bad idea? I've read that we can
do it with PCA (Principle Components Analysis). Is there a Ḿahout code for
this somewhere?
Thanks a lot once again,
Svetlomir.
Am 24.07.2011 20:46, schrieb Ted Dunning:
My impression (and Svetlomir should correct me) is that the intent was to
use two
24.07.2011 21:15, schrieb Ted Dunning:
I remember this problem.
Is it possible for you to post some sample data?
On Sun, Jul 24, 2011 at 12:08 PM, Svetlomir Kasabov
skasa...@smail.inf.fh-brs.de wrote:
Hello again and thanks for the replies of both of you, I really
apreciate
them
onto
the a patch near the north pole of S^4, while other pairs of vectors may
have
(nearly) unchanged distances.
Am I misunderstanding what the question was?
On Thu, Jul 21, 2011 at 9:43 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
Embed onto a very small part of S^4
On Thu
It is a family relationship for the most part.
Mahout came from the Lucene community.
Mahout still uses Lucene.Some Lucene users use Mahout, but Lucene and
Solr themselves do not depend on Mahout.
On Fri, Jul 22, 2011 at 2:57 PM, Joanne Sun joanneh...@gmail.com wrote:
Hi I have a humble
Doing variable selection using a chi^2 statistic like Wald's are the log
likelihood ratio is a very dangerous thing in high dimensional spaces that
are the target of the SGD framework in Mahout. The problem is that the
variable selection itself can over-fit.
To address this problem, I suggest
This is underspecified. Simply adding an additional large valued coordinate
and normalizing back to the sphere will do you what you want. This works
because small regions of S^{n+1} are very close to R^n in terms of the
Euclidean metric. This is rarely that useful, however, if your interest is
Embed onto a very small part of S^4
On Thu, Jul 21, 2011 at 9:14 PM, Jake Mannix jake.man...@gmail.com wrote:
Think about it in 3-dimensions, how can this work?
You constructed the first vector with a dimension of 1. It looks like you
constructed the second one with a larger dimension of 2.
When you offset a sparse vector, all of the zeros become non-zero and the
vector becomes dense. This results in a bunch of cells being created.
On Wed, Jul 20,
of entries in the final vector.
Thanks a lot for your help
Marco
On 20 Jul 2011, at 17:42, Ted Dunning wrote:
You constructed the first vector with a dimension of 1. It looks like you
constructed the second one with a larger dimension of 2.
When you offset a sparse vector, all
Nah... just the kind of blindness that keeps me from seeing the blueberries
on the second shelf.
Happens all the time in my world.
On Wed, Jul 20, 2011 at 10:25 AM, Benson Margulies bimargul...@gmail.comwrote:
My strong expectation is that this is a case of refrigerator blindness.
Small
approach, because it becomes very time
and computational expensive. Is there any implementation of an approximate
way to compute it in Mahout? I have had a look in the library, but I do not
find it.
thanks for your help
Marco
On 20 Jul 2011, at 19:15, Ted Dunning wrote:
Well
Just use a frequency weighted cosine distance and index words and
anomalously common cooccurrences. That gives you pretty much all you are
asking for.
Also, your progressive increase approach sounds a lot like k-means. You
might take a look to see if that could help.
On Wed, Jul 20, 2011 at
cooccurrences, but
I'll investigate.
Thanks a lot
Marco
On 20 Jul 2011, at 20:36, Ted Dunning wrote:
frequency weighted cosine distance
useful suggestions
Marco
On 20 Jul 2011, at 23:38, Ted Dunning wrote:
Actually, I would suggest weighting words by something like tf-idf
weighting.
http://en.wikipedia.org/wiki/**Tf%E2%80%93idfhttp://en.wikipedia.org/wiki/Tf%E2%80%93idf
log or sqrt(tf) is often good instead of linear tf
Yes. This can work. And it can go both ways since you might do something
like combine recommendations for a specific book with more general
recommendations for a specific author or genre. You can also have
recommendations for, say, an author or genre based on demographic quantities
such as
of
prevalence can seriously impact your algorithm run-time (adversely). You
can compensate for this by sampling or just recognizing that such pervasive
features inherently cannot be very useful since too many things would be
recommended.
On Wed, Jul 20, 2011 at 8:51 PM, Ted Dunning ted.dunn
I usually just post process the recommendations using a variety of business
logic rules.
Sent from my iPhone
On Jul 18, 2011, at 14:26, Jamey Wood jamey.w...@gmail.com wrote:
Is there any best practice for including user preferences for certain items
as a Recommender input, but ensuring
Yes...
I always forget about that. You must have mentioned this half a dozen
times.
On Mon, Jul 18, 2011 at 3:10 PM, Sean Owen sro...@gmail.com wrote:
(PS that's exactly Rescorer's role... just a hook for whatever biz
logic you want to filter by)
On Mon, Jul 18, 2011 at 10:52 PM, Ted
You have the source code.
You can make it do anything you like!
On Sun, Jul 17, 2011 at 7:28 AM, Xiaobo Gu guxiaobo1...@gmail.com wrote:
Hi ,
Can we use CSV without header or something else?
regards,
Xiaobo Gu
A typical work-flow for this is to define a disjoint set of demographic
groups and then train a classifier that has access to user actions and
free geo-demographic data such as IP, geo-IP, time of day and email
domain. If you have meta-data from the actions, then you can augment these
variables
You would have to encode the distributions as vectors.
For discrete distributions, I think that this is relatively trivial since
you could interpret each vector entry as the probability for an element i of
the domain of the distribution. I think that would result in the Hellinger
distance [1]
If you need this distance, please go for it!
The procedure for publishing the results (or the first attempts) is to file
a JIRA (see issues.apache.org/jira/browse/MAHOUT ) and attach patches to the
JIRA for review or comment.
On Wed, Jul 13, 2011 at 2:55 PM, Ian Upright ian-pub...@upright.net
I don't believe that Mahout's random forests have been used in production.
I have heard that some people got pretty good results in testing.
On Tue, Jul 12, 2011 at 6:03 AM, Xiaobo Gu guxiaobo1...@gmail.com wrote:
Hi,
When the training data set can be loaded into memory, or each split
can
Downsampling negatives should make little difference to accuracy. It can
substantially affect training time however.
Sent from my iPhone
On Jul 11, 2011, at 6:56, Svetlomir Kasabov skasa...@smail.inf.fh-brs.de
wrote:
Hello,
I plan using logistic regression for predicting the probability
.
Feature A s about the Advertisement itself;
Feature B is about the user's behaviors;
Currently im only using feature A and B.
Total training data is 250 for each class;
thanks..
From: Ted Dunning [ted.dunn...@gmail.com]
Sent: Monday, July 11, 2011 2:15 PM
an ad,
or
not; so 3 classes.
Feature A s about the Advertisement itself;
Feature B is about the user's behaviors;
Currently im only using feature A and B.
Total training data is 250 for each class;
thanks..
From: Ted Dunning [ted.dunn
Easier to simply index all, say, three word phrases and use a TF-IDF score.
This will give you a good proxy for sequence similarity. Documents should
either be chopped on paragraph boundaries to have a roughly constant length
or the score should not be normalized by document length.
Log
Can you give specific examples? The process should be relatively
straightforward and the implication that rows have row labels that are
defined by the left operand of a product and columns have column labels that
are defined by the right operand should be sufficient. Sums should have the
same
Also, item-item similarity is often (nearly) the result of a matrix product.
If yours is, then you can decompose the user x item matrix and the desired
eigenvalues are the singular values squared and the eigen vectors are the
right singular vectors for the decomposition.
On Sun, Jul 10, 2011 at
? Are there specific ways to translate these numbers
into probabilistic estimates? Is it just way too hairy?
Lance
On Thu, Jul 7, 2011 at 10:15 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
This means that the rank 2 reconstruction of your matrix is close to your
original in the sense that the Frobenius
, with 0.1 percent of
the singular vectors missing. What is my confidence in the output
data?
On Sat, Jul 9, 2011 at 11:46 AM, Ted Dunning ted.dunn...@gmail.com wrote:
I don't understand the question.
A rotation leaves the Frobenius norm unchanged. Period. Any rank-limited
optimal least
://en.wikipedia.org/wiki/Singular_value_decomposition#Low-rank_matrix_approximation
On Fri, Jul 8, 2011 at 1:53 AM, Lance Norskog goks...@gmail.com wrote:
Thanks! Very illuminating.
On Thu, Jul 7, 2011 at 10:15 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
This means that the rank 2
On Thu, Jul 7, 2011 at 2:20 PM, hakeem t...@indeed.com wrote:
Because I have so few documents, I run the set of documents through train()
in epochs -- up to 1000 times, shuffling the order of the documents on each
epoch.
Fair.
My questions:
1) Are these results surprising to you? Or,
If you keep the probes at 2, you should have better results with sparse
features and a large dimensionality reduction.
On Thu, Jul 7, 2011 at 5:58 PM, hakeem t...@indeed.com wrote:
I increased the vector size substantially and reduced the number of probes
to 1. With the collisions eliminated,
The summary of the reason is that this was a summer project and
parallelizing the random forest algorithm at all was a big enough project.
Writing a single pass on-line algorithm was considered a bit much for the
project size. Figuring out how to make multiple passes through an input
split was
Random Projection, a lame random number generator
(java.lang.Random) will generate a higher standard deviation than a
high-quality one like MurmurHash.
On Fri, Jul 1, 2011 at 5:25 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
Here is R code that demonstrates what I mean by stunning (aka
Those are both reasonably large, but not commercial in scale.
At Veoh, we had about 10 non-zero elements in our raw data. I think Netflix
has 100 million.
On Thu, Jul 7, 2011 at 8:05 PM, Lance Norskog goks...@gmail.com wrote:
What recommendation datasets, that are available, are considered
Of course, this is only true of the TextInputFormat.
You can write a CsvInputFormat in which every mapper reads the first line as
well as their assigned split. This would cause some delay at the beginning
as all of the first round of mappers whacked against the beginning of the
file, but that
pick a female or male from a
height,weight and shoe size.
Thanks again for taking the time to answer me.
-V
On Tue, Jul 5, 2011 at 4:30 AM, Ted Dunning ted.dunn...@gmail.com wrote:
The wikipedia page recommends binning if you have a large amount of data
and
a supervised variable
the model.
But this isn't working out for me.
Thanks for taking a look.
Cheers,
V
On Tue, Jul 5, 2011 at 6:06 PM, Ted Dunning ted.dunn...@gmail.com wrote:
How many training examples do you have?
Sounds like you have very few. That is definitely not the sweet spot for
on-linear
Glad we could help.
On Tue, Jul 5, 2011 at 7:09 AM, Radek Maciaszek ra...@maciaszek.co.ukwrote:
Hello,
I worked in the past on MSc project which involved quite a lot of Mahout
calculation. I finished it a while ago but only recently got my head around
posting it somewhere online.
It would
Well, PMML is the (complicated) standard solution.
Otherwise, a Naive Bayes model would probably fit as CSV data.
But seriously, it isn't that hard to read a sequence file. Re-implementing
our serialization in C++ would be generally useful as well.
On Tue, Jul 5, 2011 at 7:38 PM, Lance Norskog
The mahout implementation of Naive_Bayes does not use continuous variables
well. The best bet is to discretize these variables either individually or
together using k-means. Then use the discrete version for the classifier.
The random forest implementation and the SGD implementation are both
The wikipedia page recommends binning if you have a large amount of data and
a supervised variable extraction method if not. These are both ways of
preprocessing to discretize continuous variables.
On Mon, Jul 4, 2011 at 11:28 AM, Ted Dunning ted.dunn...@gmail.com wrote:
The mahout
On Sat, Jul 2, 2011 at 11:34 AM, Sean Owen sro...@gmail.com wrote:
Yes that's well put. My only objection is that this sounds like you're
saying that there is a systematic problem with the ordering, so it
will usually help to pick any different ordering than the one you
thought was optimal.
That is the point of the exponential in the example that I gave you. The
top few recommendations are nearly stable. It is the lower ranks that are
really churned up. This has the property that you state.
On Sat, Jul 2, 2011 at 12:45 PM, Salil Apte sa...@offlinelabs.com wrote:
I really like
I would be very surprised if java.lang.Random exhibited this behavior. It
isn't *that* bad.
On Sat, Jul 2, 2011 at 6:49 PM, Lance Norskog goks...@gmail.com wrote:
For full Random Projection, a lame random number generator
(java.lang.Random) will generate a higher standard deviation than a
into spinning
chains is very educational about entropy.
For full Random Projection, a lame random number generator
(java.lang.Random) will generate a higher standard deviation than a
high-quality one like MurmurHash.
On Fri, Jul 1, 2011 at 5:25 PM, Ted Dunning ted.dunn...@gmail.com
wrote
On Sun, Jul 3, 2011 at 1:08 PM, Sean Owen sro...@gmail.com wrote:
I don't see why one would believe that the randomly selected items
farther down the list are more likely to engage a user. If anything,
the recommender says they are less likely to be engaging.
There are two issues with this
Roughly.
But remember, a single recommendation isn't the end of the game. If this is
the last recommendation to ever be made, dithering doesn't help at all.
On Sun, Jul 3, 2011 at 1:02 PM, Konstantin Shmakov kshma...@gmail.comwrote:
It seems that as long as recommenders are dealing with the
,
Radek
On 18 February 2011 18:04, Sebastian Schelter s...@apache.org wrote:
This shouldn't be too difficult and would maybe make a good newcomer or
student project.
--sebastian
Am 18.02.2011 18:19, schrieb Ted Dunning:
A better way to sample is to find groups with a very large number
It is pretty easy to set up a reservoir sampler as a combiner and as the front
end to a reducer.
Sent from my iPhone
On Jul 2, 2011, at 14:22, Lance Norskog goks...@gmail.com wrote:
How to do this in an efficient way? No idea.
You have to watch out, however, because Hadoop wire format changes pretty
often.
On Fri, Jul 1, 2011 at 7:21 AM, Xiaobo Gu guxiaobo1...@gmail.com wrote:
I mean mahout using the higher version of Hadoop libraries connecting
to a lower version of Hadoop cluster.
On Fri, Jul 1, 2011 at 8:42 PM,
Lance,
You would get better results from the random projection if you did the first
part of the stochastic SVD. Basically, you do the random projection:
Y = A \Omega
where A is your original data, R is the random matrix and Y is the result.
Y will be tall and skinny.
Then, find an
. The standard deviation of the ratios gives a roughready
measure of the fidelity of the reduction. The standard deviation of
simple RP should be highest, then this RP + orthogonalization, then
MDS.
On Fri, Jul 1, 2011 at 11:03 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
Lance,
You would get
, that's close. let's try a hundred dot products
dot1 = rep(0,100);dot2 = rep(0,100)
for (i in 1:100) {dot1[i] = sum(a[1,] * a[i,]); dot2[i] = sum(aa[1,]*
aa[i,])}
# how close to the same are those?
max(abs(dot1-dot2))
# VERY
[1] 3.45608e-11
On Fri, Jul 1, 2011 at 4:54 PM, Ted Dunning ted.dunn
-
exponentiate the result. This will not change the function's expected
result
On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
Actually, pdf() should always be a pdf(), not a logPdf(). Many
algorithms
want one or the other. Some don't much care because log is monotonic
not know if it will work OK:
Do all calculations on logarithmic level and just before return -
exponentiate the result. This will not change the function's expected
result
On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
Actually, pdf() should always be a pdf
There should not be a change to an existing method.
It would be find to add another method, perhaps called logPdf, that does
what you suggest. This loss of precision is common with the normal
distribution in high dimensions.
On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev vavasi...@gmail.com
, 2011 at 12:57 AM, Marko Ciric ciric.ma...@gmail.com wrote:
Thanks Ted. Do you think weights (that depend on mentioned features) can be
learned with simple linear regression once the outputs of Mahout
recommenders are known?
On 10 June 2011 08:02, Ted Dunning ted.dunn...@gmail.com wrote:
When
be to create a new Model and ModelDistribution that
uses log arithmetic of your choosing. The initial models are very simple
minded and are likely not adequate for real applications.
-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Monday, June 27, 2011 7:51 AM
To: user
INdeed.
On Mon, Jun 27, 2011 at 5:27 PM, Hector Yee hector@gmail.com wrote:
So I tried Yahoo LDA on 52 M documents with 1000 topics.
Yahoo LDA with a dictionary of 100k terms does 1 iteration every 30 minutes
on a single machine using 4 cores.
Mahout LDA using 20 nodes and a
Regarding speed:
How many non-zero elements?
What is the size of your input matrices?
How long does it take to read the matrices without doing any multiplication?
Your test matrices seem small for big sparse matrices.
This sort of thing could be very useful.
On Sun, Jun 26, 2011 at 1:47 PM,
And going down teh columns in a sparse matrix could do this to you.
On Sun, Jun 26, 2011 at 6:40 PM, Jake Mannix jake.man...@gmail.com wrote:
On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue xue@gmail.com wrote:
Hi.
I was wondering how useful an in memory sparse matrix multiplier would
that cover how (and also why) it
works, check out http://hunch.net/~jl/projects/hash_reps/index.html
On Sat, Jun 25, 2011 at 1:51 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
Look at the class FeatureValueEncoder. The test cases show most of the
ways
that is used.
Also the class
overhead running on a single machine or there are other
implications to run big jobs on a single machine?
- edwin
On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote:
I have done this with VM's but I would not generally recommend it.
Without
VM's you will have a pretty ugly configuration issue
I have had best results with somewhat beefier machines because you pay less
VM overhead.
Typical Hadoop configuration advice lately is 4GB per core and 1 disk
spindle per two cores. For higher performance systems like MapR, the number
of spindles can go up.
On Sat, Jun 25, 2011 at 2:21 AM, Sean
It is quite possible.
If the new columns represent a relatively small contribution rather than a
wholesale change in the statistics of the corpus (which is almost always
true) then you can just add these columns and compute IDF weights for the
new terms based on the updated corpus statistics.
Shouldn't matter.
On Fri, Jun 24, 2011 at 3:04 AM, XiaoboGu guxiaobo1...@gmail.com wrote:
And should we call setPoolsize first, then call setThreadCount after that?
Regards,
Xiaobo Gu
Big iron is fine for some of the classifier stuff, but throughput per $ can
be higher for other algorithms with a cluster of smaller machines.
How big a machine are you talking about? Even relatively small machines are
pretty massive any more. 8 core = 16 hyper-thread machines with 48GB seem
to
Look at the class FeatureValueEncoder. The test cases show most of the ways
that is used.
Also the class TrainNewsGroups in examples.
See chapters 14 and 16 of Mahout in Action. The sample server for chapter
16 does encoding like you need.
On Fri, Jun 24, 2011 at 5:04 PM, Mark
Pretty big. SHould scream for local classifier learning.
Local Hadoop should run pretty fast as well.
On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu guxiaobo1...@gmail.com wrote:
32Core, 256G RAM
-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Saturday
.
-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Saturday, June 25, 2011 9:26 AM
To: user@mahout.apache.org
Cc: d...@mahout.apache.org
Subject: Re: Can all the algorithms in Mahout be run locally without a
Hadoop cluster.
Pretty big. SHould scream
We changed lots of names as we pulled them over. We also added test cases.
The changes at this point are pretty substantial. At the lower level, we
changed the way things worked and added new kinds of collections. At the
math layer, we pretty massively changed things by adding the ability to
Try the QR trick. It is amazingly effective.
2011/6/23 tr...@cs.drexel.edu
Alright, thanks guys.
The cases where Lanczos or the stochastic projection helps are cases
where
you have *many* columns but where the data are sparse. If you have a
very
tall dense matrix, the QR method is to
The cases where Lanczos or the stochastic projection helps are cases where
you have *many* columns but where the data are sparse. If you have a very
tall dense matrix, the QR method is to be muchly preferred.
2011/6/23 tr...@cs.drexel.edu
Ok, then what would you think to be the minimum number
This method isn't usually as numerically stable as, for instance, using a QR
decomposition. If your original data matrix is n x 2, then Q is n x 2 and R
is 2 x 2. R is trivial to decompose into U S V' and since Q is a unit
matrix, the singular values and right singular vectors of R are your
Btw.. the JIRA involved was https://issues.apache.org/jira/browse/MAHOUT-376
On Thu, Jun 23, 2011 at 11:44 AM, Ted Dunning ted.dunn...@gmail.com wrote:
If you don't need all 5000 singular values, then you can directly use the
stochastic decomposition algorithms in Mahout.
If you do want all
If you don't need all 5000 singular values, then you can directly use the
stochastic decomposition algorithms in Mahout.
If you do want all 5000 singular values, then you can probably use all but
the first and last few step of the stochastic decomposition algorithm to get
what you need. If you
I think that you can do the covariance using Jakes old outer product trick.
Of course you need to do something clever to deal with mean subtraction.
2011/6/23 tr...@cs.drexel.edu
Yes but a M/R job to create the covariance matrix would be required. With
millions of rows that is, unless I am
For billions of rows, you can do block-wise QR and get the SVD pretty
easily.
Also, the distributed matrix times will get you there with slightly less
numerical stability.
On Thu, Jun 23, 2011 at 12:53 PM, Jake Mannix jake.man...@gmail.com wrote:
Well I'm going to pretend for a second that
Doh.
Of course. I have been worried about sparsity so long that mean subtraction
causes an autonomic twitch.
On Thu, Jun 23, 2011 at 1:25 PM, Jake Mannix jake.man...@gmail.com wrote:
with 2 dense matrices being multiplied will it? And it is conceivable
that
we will have billions of rows
I have used the SGD classifiers for content based recommendation. It works
out reasonably but the interaction variables can get kind of expensive.
Doing it again, I think I would use latent factor log linear models to do
the interaction features. See
Actually, I should mention that I have done user-feature recommendations and
then (mis) used text retrieval to pull back items that have features as
text. This works reasonably well and is pretty easy to do. You will have
to watch out for very common features.
On Wed, Jun 22, 2011 at 12:50 AM,
only have one feature vector per item.
On Jun 21, 2011, at 3:49 PM, Ted Dunning wrote:
I have used the SGD classifiers for content based recommendation. It
works
out reasonably but the interaction variables can get kind of expensive.
Doing it again, I think I would use latent factor log
Also, github mirrors all apache projects (and apache also provides git
mirrors)
I have some mahout stuff on github myself. I like to put work in progress
there.
What project did you see that was deficient?
I see all of the live version at https://github.com/apache/mahout
Sounds like you should invert your loops.
These sparse matrices are probably very reasonable for solving each one on a
single machine in memory.
As such, take a look at the LSMR implementation which is a
good implementation of a conjugate
gradient-like algorithm that plays nice with sparse data.
20, 2011 at 9:21 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
Sounds like you should invert your loops.
These sparse matrices are probably very reasonable for solving each one
on a
single machine in memory.
As such, take a look at the LSMR implementation which is a
good implementation
Two things will help in addition to what Josh suggested:
a) when looking for items that are trending hot, use the difference in the
log rank as a score. For most internetly things, rank is proportional to
1/rate so log rank is -log rate. Refining this slightly to -log (epsilon +
1/rank) makes
I should add that this would be a cool thing to have if it can be made
general enough!
On Sat, Jun 18, 2011 at 7:35 PM, Ted Dunning ted.dunn...@gmail.com wrote:
I don't think that the current LDA could be misused this way, but I
wouldn't be surprised if the current variational code could
randomly permute things.
On Wed, Jun 15, 2011 at 2:50 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
It is already in Mahout, I think.
On Tue, Jun 14, 2011 at 5:48 AM, Lance Norskog goks...@gmail.com
wrote:
Coding a permutation like this in Map/Reduce is a good beginner
exercise
estimations using time series and I that's why I would know if they are
critical for me.
Many thanks and best regards,
Svetlomir.
Am 15.06.2011 20:44, schrieb Ted Dunning:
This is what the term Naive is used in the name. The scores for this kind
of algorithm are 0 to 1 or are logarithms
I should add that the regularization will also make the logistic regression
classifier a little bit conservative about estimating probabilities near 0
or near 1.
On Fri, Jun 17, 2011 at 12:19 AM, Ted Dunning ted.dunn...@gmail.com wrote:
The problem is that logistic regression makes some
The normal terminology is to name U and V in SVD as singular vectors as
opposed to eigenvectors. The term eigenvectors is normally reserved for the
symmetric case of U S U' (more generally, the Hermitian case, but we only
support real values).
On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov
something in mahout about this.
Best,
Fernando.
2011/6/15 Ted Dunning ted.dunn...@gmail.com
The normal terminology is to name U and V in SVD as singular
vectors
as
opposed to eigenvectors. The term eigenvectors is normally
reserved
This is what the term Naive is used in the name. The scores for this kind
of algorithm are 0 to 1 or are logarithms of such a number, but are not at
all calibrated probabilities.
And, frankly, it is rare in practice for the output of logistic regression
to be calibrated either. Those outputs
It is already in Mahout, I think.
On Tue, Jun 14, 2011 at 5:48 AM, Lance Norskog goks...@gmail.com wrote:
Coding a permutation like this in Map/Reduce is a good beginner exercise.
On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
But the key is that you have
On Wed, Jun 15, 2011 at 9:27 PM, aaron barnes aa...@stasis.org wrote:
I'm thinking this still most closely resembles a 'boolean' model, because
it's not a matter of the user assigning a rating to every purchase, so we're
not looking primarily for users who have given similar ratings to similar
1301 - 1400 of 1929 matches
Mail list logo