Re: Saving a Pipeline with DecisionTreeModel Spark ML

2016-02-12 Thread Rakesh Chalasani
There is already JIRA tracking this https://issues.apache.org/jira/browse/SPARK-11888 On Fri, Feb 12, 2016 at 2:34 PM gstvolvr wrote: > Hi all, > > I noticed that I cannot save a Pipeline containing a DecisionTree model > similar to the way I can save one with a LogisticRegression model. > It lo

Re: Error aliasing an array column.

2016-02-09 Thread Rakesh Chalasani
://issues.apache.org/jira/browse/SPARK-13253 Thanks for the help. On Tue, Feb 9, 2016 at 5:29 PM Ted Yu wrote: > What's your plan of using the arrayCol ? > It would be part of some query, right ? > > On Tue, Feb 9, 2016 at 2:27 PM, Rakesh Chalasani > wrote: > >> Do you mean

Re: Error aliasing an array column.

2016-02-09 Thread Rakesh Chalasani
-+ > | [0, 1]| > | [1, 2]| > | [2, 3]| > | [3, 4]| > | [4, 5]| > | [5, 6]| > | [6, 7]| > | [7, 8]| > | [8, 9]| > | [9, 10]| > ++ > > FYI > > On Tue, Feb 9, 2016 at 1:38 PM, Rakesh Chalasani > wrote: > >> Sorry, didn&#x

Re: Error aliasing an array column.

2016-02-09 Thread Rakesh Chalasani
Sorry, didn't realize the mail didn't show the code. Using Spark release 1.6.0 Below is an example to reproduce it. import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sparkContext) import sqlContext.implicits._ import org.apache.spark.sql.functions case class Test(a:Int, b:In

Re: BlockMatrix multiplication

2015-07-14 Thread Rakesh Chalasani
Hi Alexander: Aw, I missed the 'cogroup' on BlockMatrix multiply! I stand corrected. Check https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala#L361 BlockMatrix multiply uses a custom partiti

Re: BlockMatrix multiplication

2015-07-14 Thread Rakesh Chalasani
Block matrix stores the data as key->Matrix pairs and multiply does a reduceByKey operations, aggregating matrices per key. Since you said each block is residing in a separate partition, reduceByKey might be effectively shuffling all of the data. A better way to go about this is to allow multiple b

Re: Contributiona nd choice of langauge

2015-07-14 Thread Rakesh Chalasani
Here is a more specific MLlib related Umbrella for 1.5 that can help you get started https://issues.apache.org/jira/browse/SPARK-8445?jql=text%20~%20%22mllib%201.5%22 Rakesh On Tue, Jul 14, 2015 at 6:52 AM Akhil Das wrote: > You can try to resolve some Jira issues, to start with try out some ne

Re: Representing a recursive data type in Spark SQL

2015-05-20 Thread Rakesh Chalasani
Hi Jeremy: Row is a collect of 'Any'. So, you can be used as a recursive data type. Is this what you were looking for? Example: val x = sc.parallelize(Array.range(0,10)).map(x => Row(Row(x), Row(x.toString))) Rakesh On Wed, May 20, 2015 at 7:23 PM Jeremy Lucas wrote: > Spark SQL has proven

Re: DataFrames equivalent to SQL table namespacing and aliases

2015-05-08 Thread Rakesh Chalasani
To add to the above discussion, Pandas, allows suffixing and prefixing to solve this issue http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.join.html Rakesh On Fri, May 8, 2015 at 2:42 PM Nicholas Chammas wrote: > DataFrames, as far as I can tell, don’t have an equivalent to

Re: Drop column/s in DataFrame

2015-04-30 Thread Rakesh Chalasani
Sure, I will try sending a PR soon. On Thu, Apr 30, 2015 at 1:42 PM Reynold Xin wrote: > I filed a ticket: https://issues.apache.org/jira/browse/SPARK-7280 > > Would you like to give it a shot? > > > On Thu, Apr 30, 2015 at 10:22 AM, rakeshchalasani > wrote: > >> Hi All: >> >> Is there any plan