[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-07-10 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 @gatorsmile, I'm able to access `groupingExprs` from `SQLUtils.scala` through `val groupingExprs: Seq[Expression],` however it seems to be challenging to access the name of the column from pure

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-07-05 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 Alright, give me couple days to address to those cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-07-01 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 I think 'prepend' sounds better. What do you think ? Yes, the `key` in `function(key, x) { x }` can be useful for some use cases but I also think that the user could easily prepend

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-06-30 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 I think @falaki's approach is good, only I find the key which is passed as an argument together with x as an input of function is a little superfluous. --- If your project is set up for it, you

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-06-30 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 Thank you, @gatorsmile! I'll give a try. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-06-30 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 @falaki, I'd be fine with a separate `gapplyWithKeys()` method too. @shivaram, @felixcheung what do you think ? Should we add a new `gapplyWithKeys()` method ? --- If your project is set up

[GitHub] spark pull request #14431: [SPARK-16258][SparkR] Automatically append the gr...

2017-06-27 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14431#discussion_r124327866 --- Diff: R/pkg/R/DataFrame.R --- @@ -1465,10 +1464,10 @@ setMethod("dapplyCollect", #' #' Result #' - -#' Model

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-06-19 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 yes, but we only need read access. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14742: [SPARK-17177][SQL] Make grouping columns accessible from...

2017-06-19 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14742 yes, we can close this, but it would be great if you could help us a way to access the grouping columns from SparkR in #14431 --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14742: [SPARK-17177][SQL] Make grouping columns accessible from...

2017-06-19 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14742 Hi @gatorsmile, #14431 depends on this. Is there a way I can access the grouping columns from `RelationalGroupedDataset` ? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-06-19 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 Hi everyone, yes it depends on #14742 . I've been asked to close #14742. For this PR I need to access the grouping columns. If you think that there is an alternative way of accessing

[GitHub] spark pull request #10162: [SPARK-11250] [SQL] Generate different alias for ...

2017-06-02 Thread NarineK
Github user NarineK closed the pull request at: https://github.com/apache/spark/pull/10162 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2017-06-02 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/10162 @HyukjinKwon do you mean closing or fixing the PR ? As I understand from @gatorsmile he wants to close it --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-11-04 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/10162 I'd propose to have: 1. One input argument: suffixes[left, right] (if you want we can have 2 similar to pandas). 2. Default values for suffixes (I think defaults are more convenient

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-10-27 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/10162 In pandas it has 2 arguments: lsuffix='', rsuffix='', respectively for left and right sides. And it appends the suffixes to all column names regardless if they are in join condition

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-10-27 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/10162 Thank you for following up on this, @marmbrus ! I looked into two places: R and Pandas DataFrames. In R it seems that they give new names to columns(columns which aren't in merge/join

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-10-24 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/10162 I am trying different ways to solve the problem without renaming the columns and it seems that a better place to change the column names would be here: https://github.com/apache/spark/blob/master

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-10-14 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/10162 I see, I can go over the pull request this weekend. Thanks for the feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-10-07 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/10162 I'd be happy to update to the latest master if we want to review this now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14742: [SPARK-17177][SQL] Make grouping columns accessible from...

2016-09-06 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14742 @liancheng, @rxin, Do you think adding `columns` to `RelationalGroupedDataset` is reasonable or should we find a workaround on R side ? --- If your project is set up for it, you can reply

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-21 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 Made a pull request for grouping columns: #14742 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14742: [SPARK-17177][SQL] Make grouping columns accessible from...

2016-08-21 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14742 cc: @shivaram, @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14742: [SPARK-17177][SQL] Make grouping columns accessib...

2016-08-21 Thread NarineK
GitHub user NarineK opened a pull request: https://github.com/apache/spark/pull/14742 [SPARK-17177][SQL] Make grouping columns accessible from `RelationalGroupedDataset` ## What changes were proposed in this pull request? Currently, once we create `RelationalGroupedDataset

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-12 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74551161 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,61 @@ test_that("spark.survreg", { } }) +test_that(

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-11 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 yes, @shivaram , that will be one way to do. Basically, adding a new public function to `RelationalGroupedDataset` which will return the column names. If it is fine from SQL perspective

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-09 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 My point is the following: Let's say we have the following: `var relationalGroupedDataset = df.groupBy("col1", "col2");` Now, having `relationalGroupedDataset` how can I f

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-09 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 Thanks, @shivaram! Yes, we have a handle to RelationalGroupedDataset, but I couldn't access column fields of RelationalGroupedDataset's instance. Is there a way to access the columns

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-08 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r73999377 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"),

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-08 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r73999041 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"),

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-08 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r73998522 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"),

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-06 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 It seems that, currently, in SparkR the `GroupedData` which represents scala's GroupedData object doesn't have any information about the grouping keys. `RelationalGroupedDataset` has a private

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-02 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 cool! Let me give a try that option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-02 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 That's a good point, @shivaram `worker.R` is the component which has the keys and appends it to the output. I don't see any elegant way of doing it in `worker.R` yet. However, I

[GitHub] spark pull request #14431: [SPARK-16258][SparkR] Automatically append the gr...

2016-07-31 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14431#discussion_r72916339 --- Diff: docs/sparkr.md --- @@ -429,19 +431,19 @@ result <- gapplyCollect( df, "waiting", function(key, x) {

[GitHub] spark pull request #14431: [SPARK-16258][SparkR] Automatically append the gr...

2016-07-31 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14431#discussion_r72916310 --- Diff: docs/sparkr.md --- @@ -398,23 +398,25 @@ and Spark. {% highlight r %} # Determine six waiting times with the largest eruption time

[GitHub] spark pull request #14431: [SPARK-16258][SparkR][WIP] Gapply add key attach ...

2016-07-31 Thread NarineK
GitHub user NarineK opened a pull request: https://github.com/apache/spark/pull/14431 [SPARK-16258][SparkR][WIP] Gapply add key attach option ## What changes were proposed in this pull request? The following pull request addresses the new feature request described in SPARK

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-07-18 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 @shivaram, @sun-rui , I was wondering if someone created a jira for the issue described here: https://github.com/apache/spark/pull/12836#issuecomment-225403054 --- If your project is set up

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-16 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14090 Thanks, I've generated the docs with your suggested way @shivaram, but I'm not sure if I see the same thing as you. I still see some '{% highlight r %}' and some formatting issues in general. I

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-15 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14090 Thanks @shivaram, @felixcheung for the comments. I'll address those today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70923645 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70921996 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70920518 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70920244 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-12 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14090 Added data type description --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-10 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70202736 --- Diff: docs/sparkr.md --- @@ -306,6 +306,64 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-10 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70202321 --- Diff: docs/sparkr.md --- @@ -306,6 +306,64 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-10 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70198331 --- Diff: docs/sparkr.md --- @@ -306,6 +306,64 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-10 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70194370 --- Diff: docs/sparkr.md --- @@ -306,6 +306,64 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-09 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70168781 --- Diff: docs/sparkr.md --- @@ -306,6 +306,64 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-07 Thread NarineK
GitHub user NarineK opened a pull request: https://github.com/apache/spark/pull/14090 [SPARK-16112][SparkR] Programming guide for gapply/gapplyCollect ## What changes were proposed in this pull request? Updates programming guide for spark.gapply/spark.gapplyCollect

[GitHub] spark issue #13760: [SPARK-16012][SparkR] implement gapplyCollect which will...

2016-06-30 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/13760 Do you have any questions on this @shivaram , @sun-rui ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13760: [SPARK-16012][SparkR] implement gapplyCollect which will...

2016-06-28 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/13760 @felixcheung , I've addressed the comments or put a comment for the non-addressed ones. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-27 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68571809 --- Diff: R/pkg/R/DataFrame.R --- @@ -1370,14 +1370,22 @@ setMethod("dapplyCollect", #' columns with data types integer and string and the

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-27 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68571761 --- Diff: R/pkg/R/group.R --- @@ -198,62 +198,61 @@ createMethods() #' #' Applies a R function to each group in the input GroupedData

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-27 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68571781 --- Diff: R/pkg/R/DataFrame.R --- @@ -1419,6 +1427,80 @@ setMethod("gapply", gapply(grouped, fu

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-27 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68565249 --- Diff: R/pkg/R/group.R --- @@ -198,62 +198,61 @@ createMethods() #' #' Applies a R function to each group in the input GroupedData

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-27 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68563365 --- Diff: R/pkg/R/group.R --- @@ -198,62 +198,61 @@ createMethods() #' #' Applies a R function to each group in the input GroupedData

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-27 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68542491 --- Diff: R/pkg/R/DataFrame.R --- @@ -1370,14 +1370,22 @@ setMethod("dapplyCollect", #' columns with data types integer and string and the

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-23 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68298040 --- Diff: R/pkg/R/group.R --- @@ -243,17 +236,73 @@ setMethod("gapply", signature(x = "GroupedData"), func

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-23 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68291007 --- Diff: R/pkg/R/group.R --- @@ -199,17 +199,10 @@ createMethods() #' Applies a R function to each group in the input GroupedData

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] implement gapplyCollect whi...

2016-06-23 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68227494 --- Diff: R/pkg/R/group.R --- @@ -199,17 +199,10 @@ createMethods() #' Applies a R function to each group in the input GroupedData

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] gapplyCollect - applies a R...

2016-06-22 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68116142 --- Diff: R/pkg/R/group.R --- @@ -199,17 +199,10 @@ createMethods() #' Applies a R function to each group in the input GroupedData

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] gapplyCollect - applies a R...

2016-06-22 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r68114211 --- Diff: R/pkg/R/group.R --- @@ -242,18 +235,73 @@ createMethods() setMethod("gapply", signature(x = &q

[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13660#discussion_r67944525 --- Diff: docs/sparkr.md --- @@ -262,6 +262,83 @@ head(df) {% endhighlight %} +### Applying User-defined Function +In SparkR, we

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] gapplyCollect - applies a R...

2016-06-21 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r67936513 --- Diff: R/pkg/R/DataFrame.R --- @@ -1347,6 +1347,65 @@ setMethod("gapply", gapply(grouped, fu

[GitHub] spark issue #13790: [SPARK-16082][SparkR]remove duplicated docs in dapply

2016-06-21 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/13790 @shivaram , I've noticed that I didn't associate the pull request with the jira. I've just did it. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13790: remove duplicated docs in dapply

2016-06-20 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/13790 cc: @sun-rui, @shivaram @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13790: remove duplicated docs in dapply

2016-06-20 Thread NarineK
GitHub user NarineK opened a pull request: https://github.com/apache/spark/pull/13790 remove duplicated docs in dapply ## What changes were proposed in this pull request? Removed unnecessary duplicated documentation in dapply and dapplyCollect. In this pull request I

[GitHub] spark issue #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-20 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/13660 Hi @vectorijk , @felixcheung , As I was looking at the documentation generated in R I've noticed that there is some duplicated information. I'm not sure if this is the right place to ask about

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-19 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Thanks for the quick response. I'll create one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-19 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 @vectorijk, should I do the pull request for the same jira - https://issues.apache.org/jira/browse/SPARK-15672, or should I create a new jira for the programming guide? --- If your project is set

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R...

2016-06-18 Thread NarineK
GitHub user NarineK opened a pull request: https://github.com/apache/spark/pull/13760 [SPARK-16012][SparkR] GapplyCollect - applies a R function to each group similar to gapply and collects the result back to R data.frame ## What changes were proposed in this pull request

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-17 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Hi @vectorijk, Thanks for asking, i think in a separate PR. Do you think including in #13660 would be better ? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-15 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r67265581 --- Diff: R/pkg/R/DataFrame.R --- @@ -1266,6 +1266,83 @@ setMethod("dapplyCollect", ldf })

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-15 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Thanks, @shivaram and @sun-rui. Yes, I can work on programming guide for gapply. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-15 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r67264756 --- Diff: R/pkg/R/DataFrame.R --- @@ -1266,6 +1266,83 @@ setMethod("dapplyCollect", ldf })

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-15 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r67264555 --- Diff: R/pkg/R/DataFrame.R --- @@ -1266,6 +1266,83 @@ setMethod("dapplyCollect", ldf })

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-15 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r67197168 --- Diff: R/pkg/inst/worker/worker.R --- @@ -79,75 +127,72 @@ if (numBroadcastVars > 0) { # Timing broadcast broadcastElap <- elaps

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-14 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Addressed your comments @sun-rui, please let me know if you have any comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-13 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66745283 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -325,6 +330,71 @@ case class MapGroupsExec

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66732763 --- Diff: R/pkg/R/group.R --- @@ -142,3 +142,58 @@ createMethods <- function() { } createMethods() + +#' gap

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66717543 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -381,6 +385,50 @@ class RelationalGroupedDataset protected[sql

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-11 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Thanks @liancheng and @rxin ! With respect to your point, @rxin - "private[sql] signature in public APIs ." dapply added that signature to `Dataset.scala `and g

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-11 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66712035 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -381,6 +385,50 @@ class RelationalGroupedDataset protected[sql

[GitHub] spark issue #13610: [SPARKR][SQL][SPARK-15884] Overriding stringArgs in MapP...

2016-06-10 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/13610 Thanks! Changed the title! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13610: overwritting stringArgs in MapPartitionsInR

2016-06-10 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/13610 @sun-rui , @liancheng, @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13610: overwritting stringArgs in MapPartitionsInR

2016-06-10 Thread NarineK
GitHub user NarineK opened a pull request: https://github.com/apache/spark/pull/13610 overwritting stringArgs in MapPartitionsInR ## What changes were proposed in this pull request? As discussed in https://github.com/apache/spark/pull/12836 we need to override

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66672823 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -286,6 +290,9 @@ case class FlatMapGroupsInR

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread NarineK
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66670797 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -286,6 +290,9 @@ case class FlatMapGroupsInR

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-09 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Hi @sun-rui, hi @shivaram, I've overwritten the stringArgs - I've pushed my changes in the following branch. I haven't created a jira yet. https://github.com/apache/spark/commit

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-08 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Sure, let me try to override stringArgs and give it a try. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-07 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Thank you for the quick responses @sun-rui and @shivaram . Here is how the `dataframe.queyExection.toString` printout starts with: == Parsed Logical Plan

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-06 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Do you know what exactly caused this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-06 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Hi @shivaram , hi @sun-rui , Surprisingly the `dataframe.queyExection.toString` both for dapply and gapply is prepended by a huge array, which I'm not able to understand. It seems that recent

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-06 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 I can print-out the query plan on scala side and see what does it look like for that example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-06 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 not sure why it fails. It fails for my new test case on iris dataset. The resulting dataframe has 35x2 dimensions. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Locally, run-tests.sh run successfully, but it fails on jenkins ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 @shivaram, I didn't change the code, but merged with master, because prior to this the build was failing because some pyspark tests didn't pass. After my today's merge, when I run gapply

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-04 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-04 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59998/ Test FAILed

  1   2   3   4   >