Hi All,
I am trying out the spark streaming and reading the messages from kafka topics
which later would be created into streams as below...I have the kafka setup on
a vm and topics created however when I try to run the program below from my
spark vm as below I get an error even though the kafk
+1
build/mvn clean package -DskipTests -Pyarn -Phadoop-2.6
OK
Basic graph tests
Load graph using edgeListFile...SUCCESS
Run PageRank...SUCCESS
Minimum Spanning Tree Algorithm
Run basic Minimum Spanning Tree algorithm...SUCCESS
Run Minimum Spanning Tree taxonomy creation...SUCCESS
--
Vi
That is an interesting point; I run the driver as a background process
on the master node so that I can still pipe the stdout/stderr
filestreams to the (network) filesystem.
I should mention that the master is connected to the slaves with a 10
Gb link on the same managed switch that the slaves use.
Off the top of my head, I'm not sure, but it looks like virtually all the
extra time between each stage is accounted for with T_{io} in your plot,
which I'm guessing is time spent communicating results over the network? Is
your driver running on the master or is it on a different node? If you look
Hi Evan,
(I just realized my initial email was a reply to the wrong thread; I'm
very sorry about this).
Thanks for your email, and your thoughts on the sampling. That the
gradient computations are essentially the cost of a pass through each
element of the partition makes sense, especially given t
Mike,
I believe the reason you're seeing near identical performance on the
gradient computations is twofold
1) Gradient computations for GLM models are computationally pretty cheap
from a FLOPs/byte read perspective. They are essentially a BLAS "gemv" call
in the dense case, which is well known to
Hello Devs,
This email concerns some timing results for a treeAggregate in
computing a (stochastic) gradient over an RDD of labelled points, as
is currently done in the MLlib optimization routine for SGD.
In SGD, the underlying RDD is downsampled by a fraction f \in (0,1],
and the subgradients ov
sorry for the delay, yes still.
I'm still trying to figure out if it comes from bad data and trying to
isolate the bug itself...
2015-09-11 0:28 GMT+02:00 Reynold Xin :
> Does this still happen on 1.5.0 release?
>
>
> On Mon, Aug 31, 2015 at 9:31 AM, Olivier Girardot
> wrote:
>
>> tested now aga
> On 25 Sep 2015, at 19:11, Marcelo Vanzin wrote:
>
> - People who ship the assembly with their application. As Matei
> suggested (and I agree), that is kinda weird. But currently that is
> the easiest way to embed Spark and get, for example, the YARN backend
> working. There are ways around tha
Anchit,
please ignore my inputs. you are right. Thanks.
> On Sep 26, 2015, at 17:27, Fengdong Yu wrote:
>
> Hi Anchit,
>
> this is not my expected, because you specified the HDFS directory in your
> code.
> I've solved like this:
>
>val text = sc.hadoopFile(Args.input,
>
Hi Anchit,
this is not my expected, because you specified the HDFS directory in your code.
I've solved like this:
val text = sc.hadoopFile(Args.input,
classOf[TextInputFormat], classOf[LongWritable],
classOf[Text], 2)
val hadoopRdd = text.asInstanceOf[HadoopRDD[
11 matches
Mail list logo