Re: Confidence in implicit factorization

2015-07-26 Thread Sean Owen
confidence = 1 + alpha * |rating| here (so, c1 means confidence - 1), so alpha = 1 doesn't specially mean high confidence. The loss function is computed over the whole input matrix, including all missing 0 entries. These have a minimal confidence of 1 according to this formula. alpha controls how

Re: Confidence in implicit factorization

2015-07-26 Thread Sean Owen
It sounds like you're describing the explicit case, or any matrix decomposition. Are you sure that's best for count-like data? It depends, but my experience is that the implicit formulation is better. In a way, the difference between 10,000 and 1,000 count is less significant than the difference

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
In your experience with using implicit factorization for document clustering, how did you tune alpha ? Using perplexity measures or just something simple like 1 + rating since the ratings are always positive in this case On Sun, Jul 26, 2015 at 1:23 AM, Sean Owen so...@cloudera.com wrote:

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
I will think further but in the current implicit formulation with confidence, looks like I am factorizing a 0/1 matrix with weights 1 + alpha*rating for observed (1) values and 1 for unobserved (0) values. It's a bit different from LSA model. On Sun, Jul 26, 2015 at 6:45 AM, Debasish Das

Writing streaming data to cassandra creates duplicates

2015-07-26 Thread Priya Ch
Hi All, I have a problem when writing streaming data to cassandra. Or existing product is on Oracle DB in which while wrtiting data, locks are maintained such that duplicates in the DB are avoided. But as spark has parallel processing architecture, if more than 1 thread is trying to write same

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
We got good clustering results from Implicit factorization using alpha = 1.0 since I thought to have a confidence of 1 + rating to observed entries and 1 to unobserved entries. I used positivity / sparse coding basically to force sparsity on document / topic matrix...But then I got confused

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-26 Thread Bharath Ravi Kumar
Thanks Patrick. I'll await resumption of the master tree's nightly builds. -Bharath On Fri, Jul 24, 2015 at 8:38 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Bharath, There was actually an incompatible change to the build process that broke several of the Jenkins builds. This should be

Re: Asked to remove non-existent executor exception

2015-07-26 Thread Mridul Muralidharan
Simply customize your log4j confit instead of modifying code if you don't want messages from that class. Regards Mridul On Sunday, July 26, 2015, Sea 261810...@qq.com wrote: This exception is so ugly!!! The screen is full of these information when the program runs a long time, and they

Re: Asked to remove non-existent executor exception

2015-07-26 Thread Ted Yu
If I read the code correctly, that error message came from CoarseGrainedSchedulerBackend. There may be existing / future error messages, other than the one cited below, which are useful. Maybe change the log level of this message to DEBUG ? Cheers On Sun, Jul 26, 2015 at 3:28 PM, Mridul

Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-26 Thread Josh Rosen
Given that 2.11 may be more stringent with respect to warnings, we might consider building with 2.11 instead of 2.10 in the pull request builder. This would also have some secondary benefits in terms of letting us use tools like Scapegoat or SCoverage highlighting. On Sat, Jul 25, 2015 at 8:52