Johannes,
Your summary is good.
I would add that the precalculated recommendations can be large enough that
the lookup becomes more expensive. Your point about staleness is very
on-point.
On Mon, May 20, 2013 at 10:15 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
I think Pat is
Are you using Lanczos instead of SSVD for a reason?
On Mon, May 20, 2013 at 4:13 AM, Rajesh Nikam rajeshni...@gmail.com wrote:
Hello,
I have arff / csv file containing input data that I want to pass to svd :
Lanczos Singular Value Decomposition.
Which tool to use to convert it to
Thanks! Could you also add how to learn the weights you talked about, or at
least a hint? Learning weights for search engine query terms always sounds
like learning to rank to me but this always seemed pretty complicated
and i never managed to try it out..
On Tue, May 21, 2013 at 8:01 AM, Ted
Dear all,
May i ask please about usage a bit here?
Previously I had:
import com.carrotsearch.hppc.IntSet;
import com.carrotsearch.hppc.IntOpenHashSet;
IntSet columnValues = new IntOpenHashSet();
for loop (...) {
if (columnValues.contains(x)) continue;
...
columnValues.add(x);
}
It
Dear all,
After a while of debugging I understand that 3 elements were added fine,
but when adding the fourth one it does not crash but says
com.sun.jdi.InvocationException occurred invoking method. So a table
(hashtable?) of fixed size is created. Why that's happening?
On 21 May 2013 10:18,
Hi Sophie,
What you're describing is odd and while the hash set is allocated with a
fixed small size initially, it's resized as you add more elements.
Can you please post the full stack trace of the exception?
On Tue, May 21, 2013 at 12:35 PM, Sophie Sperner
sophie.sper...@gmail.comwrote:
Hi all,
I have a query regarding the Feature Vector generation for Text documents.
I have read Mahout in Action and understood how to create the text document in
feature vector weighed by Tf of Tfidf schemes. My usecase is a little tweaked
with that.
I have few keywords may be say 100 and I
Dear Dan, all,
I do not have skills to get the stack trace. The code hangs one, Eclipse
does not print me its stack trace because it does not terminate the
program. So I decided to make a small test.java file that you can easily
run.
This code has the main function that simply runs getItemList()
Link to hhpc jar file -
http://labs.carrotsearch.com/hppc-download.htmlthen press Download
button on the right.
On 21 May 2013 13:23, Sophie Sperner sophie.sper...@gmail.com wrote:
Dear Dan, all,
I do not have skills to get the stack trace. The code hangs one, Eclipse
does not print me its
Hello Ted,
Thanks for reply.
I have started exploring SVD based on its mention of could help to drop
features which are not relevant for clustering.
My objective is reduce number of features before passing them to clustering
and just keep important features.
arff/csv== ssvd (for dimensionality
I think you forgot to attach the test file
On May 21, 2013 7:30 AM, Sophie Sperner sophie.sper...@gmail.com wrote:
Link to hhpc jar file -
http://labs.carrotsearch.com/hppc-download.htmlthen press Download
button on the right.
On 21 May 2013 13:23, Sophie Sperner sophie.sper...@gmail.com
Alright, below is my message. In the next mail I will attach my files.
Dear Dan, all,
I do not have skills to get the stack trace. The code hangs one, Eclipse
does not print me its stack trace because it does not terminate the
program. So I decided to make a small test.java file that you can
I fine with using partially hppc libs partially mahout. At the moment
converted my code. Very similar API. But you may be interested in running
test.java quite simple example in order to find out the possible bug.
Best of luck to you.
On 21 May 2013 15:24, Sophie Sperner
Sounds like dimensionality reduction to me. You may want to use ssvd -pca
Apologies for brevity. Sent from my Android phone.
-Dmitriy
On May 21, 2013 6:27 AM, Rajesh Nikam rajeshni...@gmail.com wrote:
Hello Ted,
Thanks for reply.
I have started exploring SVD based on its mention of could
In the interest of getting some empirical data out about various architectures:
On Mon, May 20, 2013 at 9:46 AM, Pat Ferrel pat.fer...@gmail.com wrote:
...
You use the user history vector as a query?
The most recent suffix of the history vector. How much is used varies by
the purpose.
We
Stuti,
Here's how I would do it.
1. Create a collection of the 100 keywords that r of interest.
CollectionString keywords = new ArrayListString();
keywords.addAll(your 100 keywords);
2. For each word in each of the text documents create a Multiset (which is a
bag of words) ,
Sophie, you still haven't attached your test.java file. :)
On Tue, May 21, 2013 at 6:03 PM, Sophie Sperner sophie.sper...@gmail.comwrote:
I fine with using partially hppc libs partially mahout. At the moment
converted my code. Very similar API. But you may be interested in running
test.java
Filling in for Dmitriy's brief reply
mahout ssvd -i input -o output -pca true -us true -U false -V false -kno
of columns
From: Dmitriy Lyubimov dlie...@gmail.com
To: user@mahout.apache.org
Sent: Tuesday, May 21, 2013 11:48 AM
Subject: Re: convert input for
It should be easy to convert the below pseudocode to MapReduce to scale for
large collection of documents.
From: Suneel Marthi suneel_mar...@yahoo.com
To: user@mahout.apache.org user@mahout.apache.org
Sent: Tuesday, May 21, 2013 12:20 PM
Subject: Re: Feature
Dan,
I think that she did do the attachment and it got filtered away.
Sophie,
One easy thing to do is to file a JIRA report using
https://issues.apache.org/jira/browse/MAHOUT
Then you can attach your program to that bug report.
Alternatively, you can attach the program to some other service.
ps as far as U, V data close to zero, yes that's what you'd expect.
Here, by close to zero it still means much bigger than a rounding error
of course. e.g. 1E-12 is indeed a small number, and 1E-16 to 1E-18 would be
indeed close to zero for the purposes of singularity. 1E-2..1E-5 are
actually
PPS As far as the tool for arff, i am frankly not sure. but it sounds like
you've already solved this.
On Tue, May 21, 2013 at 1:41 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
ps as far as U, V data close to zero, yes that's what you'd expect.
Here, by close to zero it still means much
I have so far just used the weights that Solr applies natively.
In my experience, what makes a recommendation engine work better is, in
order of importance,
a) dithering so that you gather wider data
b) using multiple sources of input
c) returning results quickly and reliably
d) the actual
Inline
On Tue, May 21, 2013 at 8:59 AM, Pat Ferrel p...@occamsmachete.com wrote:
In the interest of getting some empirical data out about various
architectures:
On Mon, May 20, 2013 at 9:46 AM, Pat Ferrel pat.fer...@gmail.com wrote:
...
You use the user history vector as a query?
On Tue, May 21, 2013 at 8:47 PM, Pat Ferrel pat.fer...@gmail.com wrote:
For this sample it looks like about 20-40 clusters is best? Looking at
the results for k=40 by eyeball they do seem pretty good.
It is really hard to tell with these numbers. IN spite of their heritage,
these scaled
Thanks for the list...as a non native speaker I got problems understanding
the meaning of dithering here.
I got the feeling that somewhere between a) and d) there is also
diversification of items in the recommendation list, so increasing the
distance between the list items according to some
27 matches
Mail list logo