Re: feature selection and sparse vector support

2014-04-10 Thread Xiangrui Meng
Hi Ignacio, Please create a JIRA and send a PR for the information gain computation, so it is easy to track the progress. The sparse vector support for NaiveBayes is already implemented in branch-1.0 and master. You only need to provide an RDD of sparse vectors (created from Vectors.sparse). MLU

Re:RFC: varargs in Logging.scala?

2014-04-10 Thread witgo
In the following PR, there are related discussions. https://github.com/apache/spark/pull/332 -- Original -- From: "Marcelo Vanzin";; Date: Fri, Apr 11, 2014 08:16 AM To: "dev"; Subject: RFC: varargs in Logging.scala? Hey there, While going through the

Building Spark AMI

2014-04-10 Thread Jim Ancona
Are there scripts to build the AMI used by the spark-ec2 script? Alternatively, is there a place to download the AMI. I'm interested in using it to deploy into an internal Openstack cloud. Thanks, Jim

Re: RFC: varargs in Logging.scala?

2014-04-10 Thread Michael Armbrust
BTW... You can do calculations in string interpolation: s"Time: ${timeMillis / 1000}s" Or use format strings. f"Float with two decimal places: $floatValue%.2f" More info: http://docs.scala-lang.org/overviews/core/string-interpolation.html On Thu, Apr 10, 2014 at 5:46 PM, Michael Armbrust wrote

Re: RFC: varargs in Logging.scala?

2014-04-10 Thread Michael Armbrust
Hi Marcelo, Thanks for bringing this up here, as this has been a topic of debate recently. Some thoughts below. ... all of the suffer from the fact that the log message needs to be built > even > though it might not be used. > This is not true of the current implementation (and this is actually

RFC: varargs in Logging.scala?

2014-04-10 Thread Marcelo Vanzin
Hey there, While going through the try to get the hang of things, I've noticed several different styles of logging. They all have some downside (readability being one of them in certain cases), but all of the suffer from the fact that the log message needs to be built even though it might not be u

Re: minor optimizations to get my feet wet

2014-04-10 Thread Henry Saputra
You are welcome, thanks again for contributing =) - Henry On Thu, Apr 10, 2014 at 3:17 PM, Ignacio Zendejas wrote: > I don't think there's a noticeable performance hit by the use of reverse in > those cases. It was a quick set of changes and it helped understand what > you look for. I didn't int

Re: minor optimizations to get my feet wet

2014-04-10 Thread Ignacio Zendejas
I don't think there's a noticeable performance hit by the use of reverse in those cases. It was a quick set of changes and it helped understand what you look for. I didn't intend to nitpick, so I'll leave as is. I could have used a scala.Ordering implicitly/explicitly also, but seems overkill and d

Re: minor optimizations to get my feet wet

2014-04-10 Thread Henry Saputra
HI Ignacio, Thank you for your contribution. Just a friendly reminder, in case you have not contributed to Apache Software Foundation projects before please submit ASF ICLA form [1] or if you are sponsored by your company also ask the company to send CCLA [2] to clear the intellectual property fo

feature selection and sparse vector support

2014-04-10 Thread Ignacio Zendejas
Hi, again - As part of the next step, I'd like to make a more substantive contribution and propose some initial work on feature selection, primarily as it relates to text classification. Specifically, I'd like to contribute very straightforward code to perform information gain feature evaluation.

Re: org.apache.spark.util.Vector is deprecated what next ?

2014-04-10 Thread DB Tsai
You can construct the Breeze vector by val breezeVector = breeze.linalg.DenseVector.zeros[Double](length) If you want to convert to mllib vector, you can do val mllibVector = Vectors.fromBreeze(breezeVector) If you want to convert back to breeze vector, val newBreezeVector = mllibV

Re: minor optimizations to get my feet wet

2014-04-10 Thread Reynold Xin
Thanks for contributing! I think often unless the feature is gigantic, you can send a pull request directly for discussion. One rule of thumb in the Spark code base is that we typically prefer readability over conciseness, and thus we tend to avoid using too much Scala magic or operator overloadin

minor optimizations to get my feet wet

2014-04-10 Thread Ignacio Zendejas
Hi, all - First off, I want to say that I love spark and am very excited about MLBase. I'd love to contribute now that I have some time, but before I do that I'd like to familiarize myself with the process. In looking for a few projects and settling on one which I'll discuss in another thread, I

Re: org.apache.spark.util.Vector is deprecated what next ?

2014-04-10 Thread Patrick Wendell
You'll need to use the associated functionality in Breeze and then create a dense vector from a Breeze vector. I have a JIRA for us to update the examples for 1.0... I'm hoping Xiangrui can take a look at it. https://issues.apache.org/jira/browse/SPARK-1464 https://github.com/scalanlp/breeze/wik

org.apache.spark.util.Vector is deprecated what next ?

2014-04-10 Thread techaddict
org.apache.spark.util.Vector is deprecated so what should be done to use say if want to create a vector with zeros, def zeros(length: Int) in util.Vector using new mllib.linalg.Vector ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/org-apache-spark-u