Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Jaideep Dhok
Hi, I have faced a similar issue when trying to run a map function with predict. In my case I had some non-serializable fields in my calling class. After making those fields transient, the error went away. On Wed, Aug 13, 2014 at 6:39 PM, lancezhange wrote: > let's say you have a model which is

Re: Spark Installation

2014-07-07 Thread Jaideep Dhok
Hi Srikrishna, You can use the make-distribution script in Spark to generate the binary. Example - ./make-distribution.sh --tgz --hadoop HADOOP_VERSION The above script calls maven, so you can look into it to get the exact mvn command too. Thanks, Jaideep On Tue, Jul 8, 2014 at 8:37 AM, Srikris

Re: TaskNotSerializable when invoking KMeans.run

2014-06-30 Thread Jaideep Dhok
Hi Daniel, I also faced the same issue when using Naive Bayes classifier in MLLib. I was able to solve it by making all fields in the calling object either transient of serializable. Spark will print which class's object it was not able to serialize, in the error message. that can give you a hint.

Callbacks on freeing up of RDDs

2014-06-30 Thread Jaideep Dhok
Hi all, I am trying to create a custom RDD class for result set of queries supported in InMobi Grill (http://inmobi.github.io/grill/) Each result set has a schema (similar to Hive's TableSchema) and a path in HDFS containing the result set data. An easy way of doing this would be to create a temp