Re: MLlib NNLS implementation is buggy, returning wrong solutions

2014-07-28 Thread Shuo Xiang
It is possible that the answer (the final solution vector x) given by two different algorithms (such as the one in mllib and in R) are different, as the problem may not be strictly convex and multiple global optimum may exist. However, these answers should admit the same objective values. Can you g

Re: Large scale ranked recommendation

2014-07-17 Thread Shuo Xiang
Hi, Are you suggesting that taking simple vector dot products or sigmoid function on 10K * 1M data takes 5hrs? On Thu, Jul 17, 2014 at 3:59 PM, m3.sharma wrote: > We are using RegressionModels that comes with *mllib* package in SPARK. > > > > -- > View this message in context: > http://apache

Re: Spark Questions

2014-07-12 Thread Shuo Xiang
For your first question, the partitioning strategy can be tuned by applying different partitioner. You can use existing ones such as HashPartitioner or write your own.See this link( http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf) for some instr

Re: spark ui on yarn

2014-07-12 Thread Shuo Xiang
Hi Koert, Just curious did you find any information like "CANNOT FIND ADDRESS" after clicking into some stage? I've seen similar problems due to lost of executors. Best, On Fri, Jul 11, 2014 at 4:42 PM, Koert Kuipers wrote: > I just tested a long lived application (that we normally run in s

Re: MLLib sample data format

2014-06-22 Thread Shuo Xiang
Hi, you might find http://spark.apache.org/docs/latest/mllib-guide.html helpful. On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip wrote: > Hello, > > I am looking into a couple of MLLib data files in > https://github.com/apache/spark/tree/master/data/mllib. But I cannot find > any explanation for th

Re: Set the number/memory of workers under mesos

2014-06-20 Thread Shuo Xiang
gards > Mayur > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Fri, Jun 20, 2014 at 4:30 PM, Shuo Xiang > wrote: > >> Hi, just wondering anybody knows how to

Set the number/memory of workers under mesos

2014-06-20 Thread Shuo Xiang
Hi, just wondering anybody knows how to set up the number of workers (and the amount of memory) in mesos, while lauching spark-shell? I was trying to edit conf/spark-env.sh and it looks like that the environment variables are for YARN of standalone. Thanks!

Re: MLLib inside Storm : silly or not ?

2014-06-19 Thread Shuo Xiang
If I'm understanding correctly, you want to use MLlib for offline training and then deploy the learned model to Storm? In this case I don't think there is any problem. However if you are looking for online model update/training, this can be complicated and I guess quite a few algorithms in mllib at

Re: Not fully cached when there is enough memory

2014-06-11 Thread Shuo Xiang
Xiangrui, clicking into the RDD link, it gives the same message, say only 96 of 100 partitions are cached. The disk/memory usage are the same, which is far below the limit. Is this what you want to check or other issue? On Wed, Jun 11, 2014 at 4:38 PM, Xiangrui Meng wrote: > Could you try to cl

Re: Information on Spark UI

2014-06-11 Thread Shuo Xiang
ication but still seeing this. > > > On Wednesday, June 11, 2014, Shuo Xiang wrote: > >> Daniel, >> Thanks for the explanation. >> >> >> On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos < >> daniel.dara...@lynxanalytics.com> wrote: >> >>

Re: Information on Spark UI

2014-06-11 Thread Shuo Xiang
multiple times. > - More commonly, the result of the stage may be used in a later > calculation, and has to be recalculated. This happens if some of the > results were evicted from cache. > > > On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang > wrote: > >> Hi, >> Came

Re: groupBy question

2014-06-10 Thread Shuo Xiang
res.map(group => (group._2.size, group._2.map(_._1).max)) On Tue, Jun 10, 2014 at 6:10 PM, SK wrote: > After doing a groupBy operation, I have the following result: > > val res = > ("ID1",ArrayBuffer((145804601,"ID1","japan"))) > ("ID3",ArrayBuffer((145865080,"ID3","canada"), > (145899