Re: How to save mllib model to hdfs and reload it

2014-09-13 Thread Yanbo Liang
Shixiong, These two snippets behave different in Scala. In the second snippet, you define variable named m and does evaluate the right hand size as part of the definition. In other words, the variable was replaced by the pre-computed value of Array(1.0) in the subsequently code. So in the second

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Hoai-Thu Vuong
A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Christopher Nguyen
Hi Hoai-Thu, the issue of private default constructor is unlikely the cause here, since Lance was already able to load/deserialize the model object. And on that side topic, I wish all serdes libraries would just use constructor.setAccessible(true) by default :-) Most of the time that privacy is

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Shixiong Zhu
I think I can reproduce this error. The following code cannot work and report Foo cannot be serialized. (log in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16): class Foo { def foo() = Array(1.0) } val t = new Foo val m = t.foo val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread lancezhange
Following codes works, too class Foo1 extends Serializable { def foo() = Array(1.0) } val t1 = new Foo1 val m1 = t1.foo val r11 = sc.parallelize(List(1, 2, 3)) val r22 = r11.map(_ + m1(0)) r22.toArray On Thu, Aug 14, 2014 at 10:55 PM, Shixiong Zhu [via Apache Spark User List]

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Shixiong Zhu
I think in the following case class Foo { def foo() = Array(1.0) } val t = new Foo val m = t.foo val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray Spark should not serialize t. But looks it will. Best Regards, Shixiong Zhu 2014-08-14 23:22 GMT+08:00 lancezhange

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread lancezhange
I finally solved the problem by following code var m: org.apache.spark.mllib.classification.LogisticRegressionModel = null m = newModel // newModel is the loaded one, see above post of mine val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction =

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread lancezhange
let's say you have a model which is of class org.apache.spark.mllib.classification.LogisticRegressionModel you can save model to disk as following: /import java.io.FileOutputStream import java.io.ObjectOutputStream val fos = new FileOutputStream(e:/model.obj) val oos = new

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Jaideep Dhok
Hi, I have faced a similar issue when trying to run a map function with predict. In my case I had some non-serializable fields in my calling class. After making those fields transient, the error went away. On Wed, Aug 13, 2014 at 6:39 PM, lancezhange lancezha...@gmail.com wrote: let's say you

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Sean Owen
PS I think that solving not serializable exceptions by adding 'transient' is usually a mistake. It's a band-aid on a design problem. transient causes the default serialization mechanism to not serialize the field when the object is serialized. When deserialized, this field will be null, which

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread lancezhange
my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Christopher Nguyen
+1 what Sean said. And if there are too many state/argument parameters for your taste, you can always create a dedicated (serializable) class to encapsulate them. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 6:58 AM, Sean Owen so...@cloudera.com wrote: PS I think that solving not

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Christopher Nguyen
Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13,

Re: How to save mllib model to hdfs and reload it

2014-08-12 Thread Xiangrui Meng
For linear models, the constructors are now public. You can save the weights to HDFS, then load the weights back and use the constructor to create the model. -Xiangrui On Mon, Aug 11, 2014 at 10:27 PM, XiaoQinyu xiaoqinyu_sp...@outlook.com wrote: hello: I want to know,if I use history data to

How to save mllib model to hdfs and reload it

2014-08-11 Thread XiaoQinyu
hello: I want to know,if I use history data to training model and I want to use this model in other app.How should I do? Should I save this model in disk? And when I use this model then load it from disk.But I don't know how to save the mllib model,and reload it? I will be very pleasure,if