It is possible that the answer (the final solution vector x) given by two
different algorithms (such as the one in mllib and in R) are different, as
the problem may not be strictly convex and multiple global optimum may
exist. However, these answers should admit the same objective values. Can
you g
Hi,
Are you suggesting that taking simple vector dot products or sigmoid
function on 10K * 1M data takes 5hrs?
On Thu, Jul 17, 2014 at 3:59 PM, m3.sharma wrote:
> We are using RegressionModels that comes with *mllib* package in SPARK.
>
>
>
> --
> View this message in context:
> http://apache
For your first question, the partitioning strategy can be tuned by applying
different partitioner. You can use existing ones such as HashPartitioner or
write your own.See this link(
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf)
for some instr
Hi Koert,
Just curious did you find any information like "CANNOT FIND ADDRESS"
after clicking into some stage? I've seen similar problems due to lost of
executors.
Best,
On Fri, Jul 11, 2014 at 4:42 PM, Koert Kuipers wrote:
> I just tested a long lived application (that we normally run in s
Hi, you might find http://spark.apache.org/docs/latest/mllib-guide.html
helpful.
On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip wrote:
> Hello,
>
> I am looking into a couple of MLLib data files in
> https://github.com/apache/spark/tree/master/data/mllib. But I cannot find
> any explanation for th
gards
> Mayur
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Fri, Jun 20, 2014 at 4:30 PM, Shuo Xiang
> wrote:
>
>> Hi, just wondering anybody knows how to
Hi, just wondering anybody knows how to set up the number of workers (and
the amount of memory) in mesos, while lauching spark-shell? I was trying to
edit conf/spark-env.sh and it looks like that the environment variables are
for YARN of standalone. Thanks!
If I'm understanding correctly, you want to use MLlib for offline training
and then deploy the learned model to Storm? In this case I don't think
there is any problem. However if you are looking for online model
update/training, this can be complicated and I guess quite a few algorithms
in mllib at
Xiangrui, clicking into the RDD link, it gives the same message, say only
96 of 100 partitions are cached. The disk/memory usage are the same, which
is far below the limit.
Is this what you want to check or other issue?
On Wed, Jun 11, 2014 at 4:38 PM, Xiangrui Meng wrote:
> Could you try to cl
ication but still seeing this.
>
>
> On Wednesday, June 11, 2014, Shuo Xiang wrote:
>
>> Daniel,
>> Thanks for the explanation.
>>
>>
>> On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos <
>> daniel.dara...@lynxanalytics.com> wrote:
>>
>>
multiple times.
> - More commonly, the result of the stage may be used in a later
> calculation, and has to be recalculated. This happens if some of the
> results were evicted from cache.
>
>
> On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang
> wrote:
>
>> Hi,
>> Came
res.map(group => (group._2.size, group._2.map(_._1).max))
On Tue, Jun 10, 2014 at 6:10 PM, SK wrote:
> After doing a groupBy operation, I have the following result:
>
> val res =
> ("ID1",ArrayBuffer((145804601,"ID1","japan")))
> ("ID3",ArrayBuffer((145865080,"ID3","canada"),
> (145899
12 matches
Mail list logo