Also, if your original matrix is A, then it is usually a shorter path to
results to analyze the word (item) cooccurrence matrix A'A. The methods
below work either way.
The cooccurrence definitions I'm finding only use the summation-based one in
wikipedia. Are there any involving inverting the
If you have a document (user) and a word (item), then you
have a joint probability that any given interaction will be between this
document and word. We pretend in this case that each interaction is
independent of every other which is patently not true, but very helpful.
So if you subsample
I've seen a number of people reporting this problem on the list on the past few
months, and while I've run seq2sparse successfully in the past (using 0.4), it
is failing for me now with a ClassNotFoundException.I did a clean
re-install of the 0.5 distribution yesterday since others had
Try HEAD from Subversion. I think it's been addressed, but that change of
course would not have gone back and shown up in 0.5.
On Wed, Aug 31, 2011 at 2:06 PM, Andrea Leistra
andrea.leis...@concur.comwrote:
I've seen a number of people reporting this problem on the list on the past
few months,
I'm on the fence. It's used extensively in Lucene. We also log it, or at
least the dev channel. It's a boon for the full time devs out there. For
the rest of us, I'm undecided.
On the plus side, you get real time interaction. You also get some fun banter
from time to time w/ friends. On
IRC is also used heavily in the hbase community and people do use it there
for install questions pretty commonly.
Whether this works is critically dependent on having guys like Mike Stack
who are able to keep an eye on IRC and answer the questions. My own
preference is to use the mailing list
Mathematically speaking, random sampling is just fine. Stratifying based on
various criteria can help avoid loss of accuracy so if you had several
clusters then down sampling heavily represented clusters might work, but the
accurate definition of clusters is harder than the cooccurrence analysis
Just a suggestion, if we had logging available could we mail out a digest of
the days IRC chat to the mailing list. I know there could be a bunch of noise
in there from banter etc, but it wouldn't take much to scan for anything
important. We could get crazy and and run it through one of
I look to IRC for answers to simple questions, and I keep an eye on it to
answer other peoples' simple questions.
The Weka mailing list seems to have some of the same questions asked every
week, and they have no official IRC channel.
On Wed, Aug 31, 2011 at 11:00 AM, Ted Dunning
Is the problem not just a matter of translating from the original
identifiers to ints, so they can be used as offsets into a vector, and then
back again?
The recommender stuff has a stage for this already which hashes identifiers
to an int, stores the mapping, and un-does it at the end. In
On Aug 31, 2011, at 11:26 AM, Sean Owen wrote:
Is the problem not just a matter of translating from the original
identifiers to ints, so they can be used as offsets into a vector, and then
back again?
Yeah, I was wondering about that when looking at the RecommenderJob.
If I understand you
My own recommendation here would be to run a word-count first and then
create a dense dictionary using a sequential process. This sequential step
should be very fast because the number of items is quite modest.
I would create an additional dictionary at the same time for email
addresses.
Once
No it still wants user,item[,rating] input. But otherwise yes, it's
translated and un-translated internally as needed.
You could change the mapper to read that input easily though.
it still wants numeric input. It's hashing longs to ints. But this could
easily be changed to record a more general
On Aug 31, 2011, at 11:43 AM, Ted Dunning wrote:
My own recommendation here would be to run a word-count first and then
create a dense dictionary using a sequential process. This sequential step
should be very fast because the number of items is quite modest.
I would create an additional
Yes, I'm suggesting that could at least be 80% of what you need. If you can
generalize that bit further and refactor it, all the better.
I wouldn't bother necessarily extending to support the user: item item
item syntax or else we'd get into supporting a lot of stuff. That
conversion IMHO can be
On 10 June 2011 18:34, Jeff Eastman jeast...@narus.com wrote:
I'm still trying to figure out why reuters-0.5 does not work on either of my
clusters. The scripts themselves have no diff and the environment variables
are set as in trunk except for MAHOUT_HOME. The synthetic control and 20
Hi Dan,
No I never did. I got distracted doing something else and did not debug
further. If you are still seeing this on trunk then we should (re)open a JIRA.
Jeff
-Original Message-
From: danbri2...@danbri.org [mailto:danbri2...@danbri.org] On Behalf Of Dan
Brickley
Sent: Wednesday,
In this text-only notation, I though apostrophe meant inverse. What then is
matrix inversion?
I see a fair amount of stuff here in what I think is MathML, but is displays
raw in gmail.
On Wed, Aug 31, 2011 at 8:04 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Uhh...
A' is the transpose of A.
The shoemaker's children go barefoot. In fact, some IRC - coherent
whole workflow would be really interesting beyond this one problem. It
intersects with the problem abstracting coherence out of movie subtitle
tracks.
And it would be totally salable to the US Dept of Homeland Fascism.
On Wed,
we usually denote inverse , A^{-1} or just A^-1
Apostrophe, superscript star or {top} always mean transpose. I never
saw apostrophe to be used for inverses.
Perhaps confusion may be stemming from the fact that inverse equal
transpose if matrices are orthogonal (or even orthonormal), so
sometimes
This is a very good point that seems very likely to be the source of the
confusion.
On Wed, Aug 31, 2011 at 6:06 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
Perhaps confusion may be stemming from the fact that inverse equal
transpose if matrices are orthogonal (or even orthonormal), so
The basics of latex notation are
^ for superscript
_ for subscript
{} for grouping
\sum for summation
\log for logs
\Omega for upper case greek letter omega
\alpha for lower case greek letter beta
\int for integral.
See http://www.codecogs.com/latex/eqneditor.php for a playground where you
On Aug 31, 2011, at 8:32pm, Ted Dunning wrote:
The basics of latex notation are
^ for superscript
_ for subscript
{} for grouping
\sum for summation
\log for logs
\Omega for upper case greek letter omega
\alpha for lower case greek letter beta
\int for integral.
See
Hmm... I see this:
http://latex.codecogs.com/gif.latex?F(x,y)=0%20~~\mbox{and}~~%20\left|%20\begin{array}{ccc}%20F''_{xx}%20%20F''_{xy}%20%20F'_x%20\\%20F''_{yx}%20%20F''_{yy}%20%20F'_y%20\\%20F'_x%20%20F'_y%20%200%20\end{array}\right|%20=%200
Must be a cut and paste kind of thing.
On Wed, Aug
24 matches
Mail list logo