Re: Dimensional Reduction via Random Projection: investigations

2011-07-03 Thread Lance Norskog
The singular values on my recommender vectors come out: 90, 10, 1.2, 1.1, 1.0, 0.95. This was playing with your R code. Based on this, I'm adding the QR stuff to my visualization toolkit. Lance On Sat, Jul 2, 2011 at 10:15 PM, Lance Norskog goks...@gmail.com wrote: All pairwise distances

Re: Introducing randomness into my results

2011-07-03 Thread Ted Dunning
On Sat, Jul 2, 2011 at 11:34 AM, Sean Owen sro...@gmail.com wrote: Yes that's well put. My only objection is that this sounds like you're saying that there is a systematic problem with the ordering, so it will usually help to pick any different ordering than the one you thought was optimal.

Re: Introducing randomness into my results

2011-07-03 Thread Ted Dunning
That is the point of the exponential in the example that I gave you. The top few recommendations are nearly stable. It is the lower ranks that are really churned up. This has the property that you state. On Sat, Jul 2, 2011 at 12:45 PM, Salil Apte sa...@offlinelabs.com wrote: I really like

Re: Dimensional Reduction via Random Projection: investigations

2011-07-03 Thread Ted Dunning
I would be very surprised if java.lang.Random exhibited this behavior. It isn't *that* bad. On Sat, Jul 2, 2011 at 6:49 PM, Lance Norskog goks...@gmail.com wrote: For full Random Projection, a lame random number generator (java.lang.Random) will generate a higher standard deviation than a

Re: Dimensional Reduction via Random Projection: investigations

2011-07-03 Thread Ted Dunning
I wasn't thinking when I typed that post. An orthonormal projection always preserves distances since it is just a generalized reflection/rotation. Preserving all dot products (including to self) also implies distances are preserved because |x-y|_2 = x \dot x - 2 x \dot y + y \dot y. On Sat, Jul

Re: Introducing randomness into my results

2011-07-03 Thread Sean Owen
On Sun, Jul 3, 2011 at 8:05 AM, Ted Dunning ted.dunn...@gmail.com wrote: For instance, if the recommendation engine recommends B if you have seen A and there is little other way to discover C which is ranked rather low (and thus never seen), then there is no way for the engine to even get

Re: Using with seq2spars org.apache.lucene.analysis.Analyzer

2011-07-03 Thread rmx
Please, what I am doing wrong? trunk$ bin/mahout seq2sparse -i wikipedia-all-cat -o wikipedia-vectors-analysed -seq -a org.apache.lucene.analysis.Analyzer -ow Running on hadoop, using HADOOP_HOME=/usr/local/hadoop No HADOOP_CONF_DIR set, using /usr/local/hadoop/src/conf 11/07/03 19:52:50 INFO

Re: StackOverflowError on running bin/mahout with HADOOP_CONF_DIR specified

2011-07-03 Thread Sergey Bartunov
Oh, some code works, some not (these stack overflows). I'm confused. Seems that I need to run everything in the fresh isolated environment On 3 July 2011 21:58, Ted Dunning ted.dunn...@gmail.com wrote: Sean, Any ideas how that tiny little commit caused this? On Sun, Jul 3, 2011 at 5:07 AM,

Re: Introducing randomness into my results

2011-07-03 Thread Konstantin Shmakov
Insightful and interesting. But it seems that quantitative measure of gain/loss from different methods would help. The question is how you measure the gain? One example: suppose recommendations are ignored by 99% of the users and there is some measurable action (now or later) from 1% of the

Re: Introducing randomness into my results

2011-07-03 Thread Sean Owen
I don't see why one would believe that the randomly selected items farther down the list are more likely to engage a user. If anything, the recommender says they are less likely to be engaging. (Or put another way, by this reasoning, we ought to pick recommendations at random.) I do think that

Re: Dimensional Reduction via Random Projection: investigations

2011-07-03 Thread Lance Norskog
I whipped up a MurmurHash Random and the Gaussian plot is much cleaner. The MH version is exactly as fast as the java.util.Random- j.u.R makes a new int in every cycle, and MH makes a new long, so does half as many cycles. On Sun, Jul 3, 2011 at 12:12 AM, Ted Dunning ted.dunn...@gmail.com wrote:

Re: Introducing randomness into my results

2011-07-03 Thread Ted Dunning
On Sun, Jul 3, 2011 at 1:08 PM, Sean Owen sro...@gmail.com wrote: I don't see why one would believe that the randomly selected items farther down the list are more likely to engage a user. If anything, the recommender says they are less likely to be engaging. There are two issues with this

Re: Introducing randomness into my results

2011-07-03 Thread Ted Dunning
Roughly. But remember, a single recommendation isn't the end of the game. If this is the last recommendation to ever be made, dithering doesn't help at all. On Sun, Jul 3, 2011 at 1:02 PM, Konstantin Shmakov kshma...@gmail.comwrote: It seems that as long as recommenders are dealing with the