[jira] [Commented] (SPARK-17865) R API for global temp view

2016-10-15 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578639#comment-15578639 ] Felix Cheung commented on SPARK-17865: -- I see. So then we could either omit the support for global

Re: [VOTE] Release Apache Zeppelin 0.6.2 (RC2)

2016-10-14 Thread Felix Cheung
+1 Tested source and netinstall Thanks Mina! _ From: Ahyoung Ryu > Sent: Friday, October 14, 2016 5:28 AM Subject: Re: [VOTE] Release Apache Zeppelin 0.6.2 (RC2) To:

[jira] [Commented] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-13 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574137#comment-15574137 ] Felix Cheung commented on SPARK-17781: -- Hmm.. I'm not quite sure what it is just yet - not seeing

[jira] [Commented] (SPARK-17919) Make timeout to RBackend configurable in SparkR

2016-10-13 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573469#comment-15573469 ] Felix Cheung commented on SPARK-17919: -- Earlier bug: https://issues.apache.org/jira/browse/SPARK

[jira] [Commented] (SPARK-17904) Add a wrapper function to install R packages on each executors.

2016-10-13 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572606#comment-15572606 ] Felix Cheung commented on SPARK-17904: -- For reference these are the related PRs for Python

[jira] [Commented] (SPARK-17895) Improve documentation of "rowsBetween" and "rangeBetween"

2016-10-13 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572602#comment-15572602 ] Felix Cheung commented on SPARK-17895: -- would you like to fix this? > Improve documentat

[jira] [Comment Edited] (SPARK-17904) Add a wrapper function to install R packages on each executors.

2016-10-13 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572541#comment-15572541 ] Felix Cheung edited comment on SPARK-17904 at 10/13/16 5:15 PM: I

[jira] [Comment Edited] (SPARK-17904) Add a wrapper function to install R packages on each executors.

2016-10-13 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572541#comment-15572541 ] Felix Cheung edited comment on SPARK-17904 at 10/13/16 5:09 PM: I

[jira] [Commented] (SPARK-17904) Add a wrapper function to install R packages on each executors.

2016-10-13 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572541#comment-15572541 ] Felix Cheung commented on SPARK-17904: -- I somewhat disagree, actually. In R, it is very common

[jira] [Resolved] (SPARK-17790) Support for parallelizing R data.frame larger than 2GB

2016-10-12 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17790. -- Resolution: Fixed Assignee: Hossein Falaki Fix Version/s: 2.1.0

[jira] [Commented] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-12 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569325#comment-15569325 ] Felix Cheung commented on SPARK-17781: -- Thanks for the investigation. This might seem like a R thing

[jira] [Resolved] (SPARK-17817) PySpark RDD Repartitioning Results in Highly Skewed Partition Sizes

2016-10-11 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17817. -- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.1.0 > PySpark

[jira] [Commented] (SPARK-17865) R API for global temp view

2016-10-10 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564558#comment-15564558 ] Felix Cheung commented on SPARK-17865: -- I haven't kept up on SharedState before this but it looks

[jira] [Commented] (SPARK-17865) R API for global temp view

2016-10-10 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564534#comment-15564534 ] Felix Cheung commented on SPARK-17865: -- I can take this. > R API for global temp v

Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Felix Cheung
Should we just link to https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" > wrote: Thanks for confirming this, Sean. I filed this in

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Felix Cheung
+1 on this proposal and everyone can contribute to updates and discussions on JIRAs Will be great if this could be put on the Spark wiki. On Sat, Oct 8, 2016 at 9:05 AM -0700, "Ted Yu" > wrote: Makes sense. I trust Hyukjin, Holden and

Re: [DISCUSS] Zeppelin 0.6.2 release

2016-10-08 Thread Felix Cheung
Seems like we have a couple of good fixes. +1 for another release On Fri, Oct 7, 2016 at 11:53 PM -0700, "Ahyoung Ryu" > wrote: +1 On Sat, Oct 8, 2016 at 1:45 PM, Prabhjyot Singh wrote: > +1 > > On 8 Oct 2016

Re: How Apache Zeppelin runs a paragraph

2016-10-08 Thread Felix Cheung
Great post! On Tue, Oct 4, 2016 at 8:56 PM -0700, "Jongyoul Lee" > wrote: Hello DuyHai, Thanks for the fixing the typo. I've fixed it. Concerning the debugging, I think writing posts or updating wiki would be better. I'm willing to write a

[jira] [Resolved] (SPARK-17665) SparkR does not support options in other types consistently other APIs

2016-10-07 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17665. -- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.1.0

[jira] [Commented] (SPARK-17790) Support for parallelizing R data.frame larger than 2GB

2016-10-05 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550421#comment-15550421 ] Felix Cheung commented on SPARK-17790: -- more discussion on https://issues.apache.org/jira/browse

[jira] [Comment Edited] (SPARK-17790) Support for parallelizing R data.frame larger than 2GB

2016-10-05 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550414#comment-15550414 ] Felix Cheung edited comment on SPARK-17790 at 10/6/16 12:34 AM: Yes

[jira] [Commented] (SPARK-17790) Support for parallelizing R data.frame larger than 2GB

2016-10-05 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550414#comment-15550414 ] Felix Cheung commented on SPARK-17790: -- Yes. > Support for parallelizing R data.frame larger t

[jira] [Resolved] (SPARK-17658) write.df API requires path which is not actually always nessasary in SparkR

2016-10-05 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17658. -- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.1.0 > write.df

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Felix Cheung
Congrats and welcome, Xiao! _ From: Reynold Xin > Sent: Monday, October 3, 2016 10:47 PM Subject: welcoming Xiao Li as a committer To: Xiao Li >,

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-10-01 Thread Felix Cheung
+1 Tested and didn't find any blocker - found a few minor R doc issues to follow up. _ From: Reynold Xin > Sent: Wednesday, September 28, 2016 7:15 PM Subject: [VOTE] Release Apache Spark 2.0.1 (RC4) To:

Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Felix Cheung
+1 on longer release cycle at schedule and more maintenance releases. _ From: Mark Hamstra > Sent: Tuesday, September 27, 2016 2:01 PM Subject: Re: [discuss] Spark 2.x release cadence To: Reynold Xin

[jira] [Commented] (SPARK-17665) SparkR does not support options in other types consistently other APIs

2016-09-26 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523844#comment-15523844 ] Felix Cheung commented on SPARK-17665: -- supporting just character and logical seem fine. AFAIK we

[jira] [Commented] (SPARK-17665) SparkR supports options in other types consistently other APIs

2016-09-26 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522240#comment-15522240 ] Felix Cheung commented on SPARK-17665: -- for backward compatibility, we might need to support values

[jira] [Commented] (SPARK-17210) sparkr.zip is not distributed to executors when run sparkr in RStudio

2016-09-24 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518507#comment-15518507 ] Felix Cheung commented on SPARK-17210: -- Got it, sorry about that, I should have noticed

[jira] [Commented] (SPARK-17634) Spark job hangs when using dapply

2016-09-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517839#comment-15517839 ] Felix Cheung commented on SPARK-17634: -- Also it would be great if you have a shareable example

[jira] [Commented] (SPARK-17634) Spark job hangs when using dapply

2016-09-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517837#comment-15517837 ] Felix Cheung commented on SPARK-17634: -- I see. Do you know if the partitions are evenly distributed

[jira] [Commented] (SPARK-17210) sparkr.zip is not distributed to executors when run sparkr in RStudio

2016-09-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517224#comment-15517224 ] Felix Cheung commented on SPARK-17210: -- cc [~rxin] - this is merged to master and branch-2.0. If we

[jira] [Comment Edited] (SPARK-17210) sparkr.zip is not distributed to executors when run sparkr in RStudio

2016-09-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517224#comment-15517224 ] Felix Cheung edited comment on SPARK-17210 at 9/23/16 6:41 PM: --- cc [~rxin

[jira] [Resolved] (SPARK-17210) sparkr.zip is not distributed to executors when run sparkr in RStudio

2016-09-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17210. -- Resolution: Fixed Assignee: Jeff Zhang Fix Version/s: 2.1.0

[jira] [Resolved] (SPARK-17499) make the default params in sparkR spark.mlp consistent with MultilayerPerceptronClassifier

2016-09-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17499. -- Resolution: Fixed Assignee: Weichen Xu Fix Version/s: 2.1.0 Target

[jira] [Commented] (SPARK-17634) Spark job hangs when using dapply

2016-09-22 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515176#comment-15515176 ] Felix Cheung commented on SPARK-17634: -- How long have you let it run? > Spark job hangs when us

[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2016-09-20 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507787#comment-15507787 ] Felix Cheung commented on SPARK-17608: -- This is in fact problematic - R base supports integer in 32

Re: Binary for CDH 5.8.0

2016-09-19 Thread Felix Cheung
We don't have Hadoop distribution specific binary releases - compiling from source with switches would be the best route. On Mon, Sep 19, 2016 at 12:22 PM -0700, "Abhi Basu" <9000r...@gmail.com> wrote: Is there a specific binary for CDH 5.8.0, hadoop 2.6. and

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-18 Thread Felix Cheung
ink a 2.0 uber jar will play nicely on a 1.5 standalone cluster. On Saturday, September 10, 2016, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: You should be able to get it to work with 2.0 as uber jar. What type cluster you are running on? YARN? An

[jira] [Commented] (SPARK-17572) Write.df is failing on spark cluster

2016-09-17 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15499332#comment-15499332 ] Felix Cheung commented on SPARK-17572: -- does it work when you run the hadoop command equivalent

Re: Hbase configuration storage without data

2016-09-13 Thread Felix Cheung
I like that approach - though you should be able to clear result output before exporting the note, if all you want is the config? The should remove all output data, keeping it smaller? _ From: Mohit Jaggi > Sent:

Re: Matplotlib uses tkinter instead of Agg

2016-09-13 Thread Felix Cheung
And matplotlib.use('Agg') Would only work before matplotlib is first used so you would need to restart the interpreter. From error stack below it looks like something might be setting the default backend in matplotlib to TkAgg though. Are you using the Python interpreter or PySpark

Re: SparkR error: reference is ambiguous.

2016-09-10 Thread Felix Cheung
Could you provide more information on how df in your example is created? Also please include the output from printSchema(df)? This example works: > c <- createDataFrame(cars) > c SparkDataFrame[speed:double, dist:double] > c$speed <- c$dist*0 > c SparkDataFrame[speed:double, dist:double] >

Re: questions about using dapply

2016-09-10 Thread Felix Cheung
You might need MARGIN capitalized, this example works though: c <- as.DataFrame(cars) # rename the columns to c1, c2 c <- selectExpr(c, "speed as c1", "dist as c2") cols_in <- dapplyCollect(c, function(x) {apply(x[, paste("c", 1:2, sep = "")], MARGIN=2, FUN = function(y){ y %in% c(61, 99)})}) #

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
How are you calling dirs()? What would be x? Is dat a SparkDataFrame? With SparkR, i in dat[i, 4] should be an logical expression for row, eg. df[df$age %in% c(19, 30), 1:2] On Sat, Sep 10, 2016 at 11:02 AM -0700, "Bene" >

Re: Assign values to existing column in SparkR

2016-09-10 Thread Felix Cheung
If you are to set a column to 0 (essentially remove and replace the existing one) you would need to put a column on the right hand side: > df <- as.DataFrame(iris) > head(df) Sepal_Length Sepal_Width Petal_Length Petal_Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
Could you include code snippets you are running? On Sat, Sep 10, 2016 at 1:44 AM -0700, "Bene" > wrote: Hi, I am having a problem with the SparkR API. I need to subset a distributed data so I can extract single values from

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-10 Thread Felix Cheung
You should be able to get it to work with 2.0 as uber jar. What type cluster you are running on? YARN? And what distribution? On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" > wrote: You really shouldn't mix different versions of Spark

[jira] [Comment Edited] (SPARK-17428) SparkR executors/workers support virtualenv

2016-09-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475612#comment-15475612 ] Felix Cheung edited comment on SPARK-17428 at 9/9/16 1:59 AM: -- I don't think

[jira] [Commented] (SPARK-17428) SparkR executors/workers support virtualenv

2016-09-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475612#comment-15475612 ] Felix Cheung commented on SPARK-17428: -- I don't think I see a way to specify a version number

[jira] [Commented] (SPARK-17428) SparkR executors/workers support virtualenv

2016-09-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474587#comment-15474587 ] Felix Cheung commented on SPARK-17428: -- Agree with above. And to be clear, packrat is still calling

[jira] [Commented] (SPARK-17428) SparkR executors/workers support virtualenv

2016-09-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472995#comment-15472995 ] Felix Cheung commented on SPARK-17428: -- PySpark in fact has a on-going PR on supporting `virtualenv

Re: No SparkR on Mesos?

2016-09-07 Thread Felix Cheung
This is correct - SparkR is not quite working completely on Mesos. JIRAs and contributions welcome! On Wed, Sep 7, 2016 at 10:21 AM -0700, "Michael Gummelt" > wrote: Quite possibly. I've never used it. I know Python was "unsupported"

[jira] [Assigned] (SPARK-17173) Refactor R mllib for easier ml implementations

2016-09-04 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung reassigned SPARK-17173: Assignee: Felix Cheung > Refactor R mllib for easier ml implementati

[jira] [Assigned] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR

2016-09-04 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung reassigned SPARK-17315: Assignee: Junyang Qian > Add Kolmogorov-Smirnov Test to Spa

[jira] [Resolved] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR

2016-09-03 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17315. -- Resolution: Fixed Fix Version/s: 2.1.0 > Add Kolmogorov-Smirnov Test to Spa

[jira] [Created] (SPARK-17376) Spark version should be available in R

2016-09-02 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-17376: Summary: Spark version should be available in R Key: SPARK-17376 URL: https://issues.apache.org/jira/browse/SPARK-17376 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"

2016-09-02 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-15509. -- Resolution: Fixed Fix Version/s: 2.1.0 Target Version/s: 2.1.0 > R ML

[jira] [Resolved] (SPARK-16883) SQL decimal type is not properly cast to number when collecting SparkDataFrame

2016-09-02 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-16883. -- Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Target

Re: PySpark: preference for Python 2.7 or Python 3.5?

2016-09-02 Thread Felix Cheung
There is an Anaconda parcel one could readily install on CDH https://docs.continuum.io/anaconda/cloudera As Sean says it is Python 2.7.x. Spark should work for both 2.7 and 3.5. _ From: Sean Owen > Sent: Friday,

[jira] [Commented] (SPARK-17339) Fix SparkR tests on Windows

2016-08-31 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452998#comment-15452998 ] Felix Cheung commented on SPARK-17339: -- Seems that way. I can help but I don't have Windows to test

[jira] [Resolved] (SPARK-17178) Allow to set sparkr shell command through --conf

2016-08-31 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17178. -- Resolution: Fixed Target Version/s: 2.1.0 > Allow to set sparkr shell comm

[jira] [Updated] (SPARK-17178) Allow to set sparkr shell command through --conf

2016-08-31 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-17178: - Priority: Minor (was: Major) Fix Version/s: 2.1.0 > Allow to set sparkr shell comm

[jira] [Comment Edited] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-28 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442870#comment-15442870 ] Felix Cheung edited comment on SPARK-17214 at 8/28/16 6:14 AM: --- I think

[jira] [Comment Edited] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-28 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442870#comment-15442870 ] Felix Cheung edited comment on SPARK-17214 at 8/28/16 6:15 AM: --- I think

[jira] [Commented] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-28 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442870#comment-15442870 ] Felix Cheung commented on SPARK-17214: -- I think the underlining issue is that we should either

[jira] [Commented] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-28 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442865#comment-15442865 ] Felix Cheung commented on SPARK-17214: -- [~bansalism] what version of Spark + SparkR are you testing

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-25 Thread Felix Cheung
parkR: Error in writeBin(batch, con, endian = "big") To: <user@spark.apache.org<mailto:user@spark.apache.org>>, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> I tested both in local and cluster mode and the ‘<<-‘ seemed to work

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-25 Thread Felix Cheung
Cinquegrana, Piero <piero.cinquegr...@neustar.biz<mailto:piero.cinquegr...@neustar.biz>> Sent: Wednesday, August 24, 2016 10:37 AM Subject: RE: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big") To: Cinquegrana, Piero <piero.cinquegr...@neustar.biz<mailto:

[jira] [Resolved] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR

2016-08-24 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-16445. -- Resolution: Fixed Fix Version/s: 2.1.0 > Multilayer Perceptron Classifier wrap

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-22 Thread Felix Cheung
How big is the output from score()? Also could you elaborate on what you want to broadcast? On Mon, Aug 22, 2016 at 11:58 AM -0700, "Cinquegrana, Piero" > wrote: Hello, I am using the new R API in SparkR spark.lapply

[jira] [Comment Edited] (SPARK-16578) Configurable hostname for RBackend

2016-08-22 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431495#comment-15431495 ] Felix Cheung edited comment on SPARK-16578 at 8/22/16 7:52 PM: --- +1

[jira] [Commented] (SPARK-16578) Configurable hostname for RBackend

2016-08-22 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431495#comment-15431495 ] Felix Cheung commented on SPARK-16578: -- +1 on this. Some discussions on RBackend API or connect

[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public

2016-08-22 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431492#comment-15431492 ] Felix Cheung commented on SPARK-16581: -- Certainly - I don't think we should bite off more than we

[jira] [Commented] (SPARK-17157) Add multiclass logistic regression SparkR Wrapper

2016-08-22 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431484#comment-15431484 ] Felix Cheung commented on SPARK-17157: -- Sounds like good to have. Please check up the latest changes

[jira] [Resolved] (SPARK-17173) Refactor R mllib for easier ml implementations

2016-08-22 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17173. -- Resolution: Fixed Fix Version/s: 2.1.0 Target Version/s: 2.1.0 > Refacto

Re: Disable logger in SparkR

2016-08-22 Thread Felix Cheung
You should be able to do that with log4j.properties http://spark.apache.org/docs/latest/configuration.html#configuring-logging Or programmatically https://spark.apache.org/docs/2.0.0/api/R/setLogLevel.html _ From: Yogesh Vyas

[jira] [Created] (SPARK-17173) Refactor R mllib for easier ml implementations

2016-08-20 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-17173: Summary: Refactor R mllib for easier ml implementations Key: SPARK-17173 URL: https://issues.apache.org/jira/browse/SPARK-17173 Project: Spark Issue Type

Re: Best way to read XML data from RDD

2016-08-19 Thread Felix Cheung
way to read XML data from RDD To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Yes . It accepts a xml file as source but not RDD. The XML data embedded inside json is streamed

Re: Best way to read XML data from RDD

2016-08-19 Thread Felix Cheung
Have you tried https://github.com/databricks/spark-xml ? On Fri, Aug 19, 2016 at 1:07 PM -0700, "Diwakar Dhanuskodi" > wrote: Hi, There is a RDD with json data. I could read json data using rdd.read.json . The json data has

[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public

2016-08-19 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427687#comment-15427687 ] Felix Cheung commented on SPARK-16581: -- I think JVM<->R is closely related to RBackend? Beca

Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp

2016-08-18 Thread Felix Cheung
Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, user @spark <user@spark.apache.org<mailto:u

Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp

2016-08-18 Thread Felix Cheung
Do you have a file called tmp at / on HDFS? On Thu, Aug 18, 2016 at 2:57 PM -0700, "Andy Davidson" > wrote: For unknown reason I can not create UDF when I run the attached notebook on my cluster. I get the following error

[jira] [Commented] (SPARK-16137) Random Forest wrapper in SparkR

2016-08-18 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426365#comment-15426365 ] Felix Cheung commented on SPARK-16137: -- [~vectorijk] do you still have time for this? > Ran

[jira] [Resolved] (SPARK-16447) LDA wrapper in SparkR

2016-08-18 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-16447. -- Resolution: Fixed Fix Version/s: 2.1.0 > LDA wrapper in Spa

[jira] [Comment Edited] (SPARK-16581) Making JVM backend calling functions public

2016-08-18 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426263#comment-15426263 ] Felix Cheung edited comment on SPARK-16581 at 8/18/16 10:59 AM: I think

[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public

2016-08-18 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426263#comment-15426263 ] Felix Cheung commented on SPARK-16581: -- I think it'll be great if we could converge

Re: ShortCircuitIfNotCurrentOperator

2016-08-17 Thread Felix Cheung
In fact, NONE in Oozie is also very useful - it means just skip it if missed (expect the next coming up to catch up) _ From: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Sent: Wednesday, August 17, 2016 5:55 PM

Re: ShortCircuitIfNotCurrentOperator

2016-08-17 Thread Felix Cheung
s! On Wed, Aug 17, 2016 at 4:43 PM, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: > Cool. Other scheduler has a concept called LAST_ONLY? > > > > > > > On Wed, Aug 17, 2016 at 2:34 PM -0700, "siddharth anand" &l

[jira] [Resolved] (SPARK-16444) Isotonic Regression wrapper in SparkR

2016-08-17 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-16444. -- Resolution: Fixed Fix Version/s: 2.1.0 > Isotonic Regression wrapper in Spa

Re: UDF in SparkR

2016-08-17 Thread Felix Cheung
This is supported in Spark 2.0.0 as dapply and gapply. Please see the API doc: https://spark.apache.org/docs/2.0.0/api/R/ Feedback welcome and appreciated! _ From: Yogesh Vyas > Sent: Tuesday, August 16, 2016 11:39 PM

Re: [VOTE] Apache Zeppelin release 0.6.1 (rc2)

2016-08-14 Thread Felix Cheung
+1 Tested out binaries and netinstall, with spark and a few other interpreters. Thanks Mina! _ From: Alexander Bezzubov > Sent: Sunday, August 14, 2016 12:05 AM Subject: Re: [VOTE] Apache Zeppelin release 0.6.1 (rc2) To:

[jira] [Comment Edited] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-11 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417541#comment-15417541 ] Felix Cheung edited comment on SPARK-16519 at 8/11/16 4:40 PM: --- since we

[jira] [Commented] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-11 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417541#comment-15417541 ] Felix Cheung commented on SPARK-16519: -- since we are undecided on what to export for RDD, should we

[jira] [Comment Edited] (SPARK-16577) Add check-cran script to Jenkins

2016-08-11 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417536#comment-15417536 ] Felix Cheung edited comment on SPARK-16577 at 8/11/16 4:36 PM: --- I found

[jira] [Commented] (SPARK-16577) Add check-cran script to Jenkins

2016-08-11 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417536#comment-15417536 ] Felix Cheung commented on SPARK-16577: -- I found that to run the cran check on PDF it requires

Re: SparkR error when repartition is called

2016-08-09 Thread Felix Cheung
I think it's saying a string isn't being sent properly from the JVM side. Does it work for you if you change the dapply UDF to something simpler? Do you have any log from YARN? _ From: Shane Lee >

Re: spark.jars option for Zeppelin over Livy

2016-08-02 Thread Felix Cheung
packages" on Livy interpreter properties. Please see the attached screenshot On Tue, Aug 2, 2016 at 1:43 PM, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: Have you tried setting it in the Interpreter menu under Livy? On Tue, Aug 2, 2

[jira] [Comment Edited] (SPARK-16693) Remove R deprecated methods

2016-07-25 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392537#comment-15392537 ] Felix Cheung edited comment on SPARK-16693 at 7/25/16 7:38 PM: --- Deprecated

[jira] [Commented] (SPARK-16693) Remove R deprecated methods

2016-07-25 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392537#comment-15392537 ] Felix Cheung commented on SPARK-16693: -- Deprecated methods because of Spark JVM API changes are only

<    15   16   17   18   19   20   21   22   23   24   >