hello, I ‘m using spark 1.4.2-SNAPSHOT
I ‘m running in yarn mode:-)
I wonder if the spark.shuffle.memoryFraction or spark.shuffle.manager work?
how to set these parameters...
在 2015年7月1日,上午1:32,Ted Yu yuzhih...@gmail.com 写道:
Which Spark release are you using ?
Are you running in standalone
/LzoTextInputFormat.java
the class. You can read more here
https://github.com/twitter/hadoop-lzo#maven-repository
https://github.com/twitter/hadoop-lzo#maven-repository
Thanks
Best Regards
On Thu, May 14, 2015 at 1:22 PM, lisendong lisend...@163.com
mailto:lisend...@163.com wrote
I have one hdfs dir, which contains many files:
/user/root/1.txt
/user/root/2.txt
/user/root/3.txt
/user/root/4.txt
and there is a daemon process which add one file per minute to this dir.
(e.g., 5.txt, 6.txt, 7.txt...)
I want to start a spark streaming job which load 3.txt, 4.txt and then
the pseudo code :
object myApp {
var myStaticRDD: RDD[Int]
def main() {
... //init streaming context, and get two DStream (streamA and streamB)
from two hdfs path
//complex transformation using the two DStream
val new_stream = streamA.transformWith(StreamB, (a, b, t) = {
yes!
thank you very much:-)
在 2015年4月2日,下午7:13,Sean Owen so...@cloudera.com 写道:
Right, I asked because in your original message, you were looking at
the initialization to a random vector. But that is the initial state,
not final state.
On Thu, Apr 2, 2015 at 11:51 AM, lisendong lisend
to the initialization, not the result, right? It's possible
that the resulting weight vectors are sparse although this looks surprising
to me. But it is not related to the initial state, right?
On Thu, Apr 2, 2015 at 10:43 AM, lisendong lisend...@163.com
mailto:lisend...@163.com wrote:
I
to the initialization, not the result, right? It's possible
that the resulting weight vectors are sparse although this looks surprising
to me. But it is not related to the initial state, right?
On Thu, Apr 2, 2015 at 10:43 AM, lisendong lisend...@163.com
mailto:lisend...@163.com wrote:
I found
2015年3月31日,上午12:11,Xiangrui Meng men...@gmail.com 写道:
setCheckpointInterval was added in the current master and branch-1.3. Please
help check whether it works. It will be included in the 1.3.1 and 1.4.0
release. -Xiangrui
On Mon, Mar 30, 2015 at 7:27 AM, lisendong lisend...@163.com
-1.3. Please
help check whether it works. It will be included in the 1.3.1 and 1.4.0
release. -Xiangrui
On Mon, Mar 30, 2015 at 7:27 AM, lisendong lisend...@163.com
mailto:lisend...@163.com wrote:
hi, xiangrui:
I found the ALS of spark 1.3.0 forget to do checkpoint() in explicit ALS
. Is it correct?
Best,
Xiangrui
On Tue, Mar 31, 2015 at 8:58 AM, lisendong lisend...@163.com
mailto:lisend...@163.com wrote:
guoqiang ’s method works very well …
it only takes 1TB disk now.
thank you very much!
在 2015年3月31日,下午4:47,GuoQiang Li wi...@qq.com mailto:wi...@qq.com 写道:
You
. It will be included in the 1.3.1 and 1.4.0
release. -Xiangrui
On Mon, Mar 30, 2015 at 7:27 AM, lisendong lisend...@163.com
mailto:lisend...@163.com wrote:
hi, xiangrui:
I found the ALS of spark 1.3.0 forget to do checkpoint() in explicit ALS:
the code is :
https://github.com/apache/spark
you see, the core of ALS 1.0.0 is the following code:
there should be flatMap and groupByKey when running ALS iterations , right?
but when I run als iteration, there are ONLY flatMap tasks...
do you know why?
private def updateFeatures(
products: RDD[(Int,
I found my task takes so long time for YoungGen GC, I set the young gen size
to about 1.5G, I wonder why it takes so long time?
not all the tasks take such long time, only about 1% tasks so long...
180.426: [GC [PSYoungGen: 9916105K-1676785K(14256640K)]
26201020K-18690057K(53403648K), 17.3581500
I ‘m sorry, but how to look at the mesos logs?
where are them?
在 2015年3月4日,下午6:06,Akhil Das ak...@sigmoidanalytics.com 写道:
You can check in the mesos logs and see whats really happening.
Thanks
Best Regards
On Wed, Mar 4, 2015 at 3:10 PM, lisendong lisend...@163.com
mailto:lisend
I 'm using spark1.0.0 with cloudera.
but I want to use new als code which supports more features, such as rdd
cache level(MEMORY ONLY), checkpoint, and so on.
What is the easiest way to use the new als code?
I only need the mllib als code, so maybe I don't need to update all the
spark mllib
15/03/04 09:26:36 INFO ClientCnxn: Client session timed out, have not heard
from server in 26679ms for sessionid 0x34bbf3313a8001b, closing socket
connection and attempting reconnect
15/03/04 09:26:36 INFO ConnectionStateManager: State change: SUSPENDED
15/03/04 09:26:36 INFO
in ALS, I guess all the iteration’s rdds are referenced by its next
iteration’s rdd, so all the shuffle data will not be deleted until the als job
finished…
I guess checkpoint could solve my problem, do you know checkpoint?
在 2015年3月3日,下午4:18,nitin [via Apache Spark User List]
As long as I set the spark.local.dir to multiple disks, the job will
failed, the errors are as follow:
(if I set the spark.local.dir to only 1 dir, the job will succed...)
Exception in thread main org.apache.spark.SparkException: Job cancelled
because SparkContext was shut down
at
why does the gc time so long?
i 'm using als in mllib, while the garbage collection time is too long
(about 1/3 of total time)
I have tried some measures in the tunning spark guide, and try to set the
new generation memory, but it still does not work...
Tasks
Task Index Task ID
I 'm using spark als.
I set the iteration number to 30.
And in each iteration, tasks will produce nearly 1TB shuffle write.
To my surprise, this shuffle data will not be cleaned until the total job
finished, which means, I need 30TB disk to store the shuffle data.
I think after each
I’m using ALS with spark 1.0.0, the code should be:
https://github.com/apache/spark/blob/branch-1.0/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
I think the following two method should produce the same (or near) result:
MatrixFactorizationModel model =
a Rating for these data points. What then?
Also would you care to bring this to the user@ list? it's kind of interesting.
On Thu, Feb 26, 2015 at 2:02 PM, lisendong lisend...@163.com wrote:
I set the score of ‘0’ interaction user-item pair to 0.0
the code is as following:
if (ifclick 0
22 matches
Mail list logo