[jira] [Resolved] (SPARK-2654) Leveled logging in PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-2654. Resolution: Fixed This has been fixed in SPARK-3444 / ae98eec730125c1153dcac9ea941959cc79e4f42 > Leveled lo

[jira] [Commented] (SPARK-2868) Support named accumulators in Python

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556582#comment-15556582 ] holdenk commented on SPARK-2868: Is this something we are still interested in pursuing (cc

[jira] [Resolved] (SPARK-2999) Compress all the serialized data

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-2999. Resolution: Fixed Fixed in b5c51c8df480f1a82a82e4d597d8eea631bffb4e > Compress all the serialized data > --

[jira] [Commented] (SPARK-4488) Add control over map-side aggregation

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556650#comment-15556650 ] holdenk commented on SPARK-4488: So while the associated PR is closed, we ended up adding

[jira] [Closed] (SPARK-5160) Python module in jars

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-5160. -- Resolution: Fixed This is now supported. > Python module in jars > - > > Ke

[jira] [Commented] (SPARK-1425) PySpark can crash Executors if worker.py fails while serializing data

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556680#comment-15556680 ] holdenk commented on SPARK-1425: Is this still an issue or do we have a repro case for it?

[jira] [Resolved] (SPARK-4851) "Uninitialized staticmethod object" error in PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-4851. Resolution: Fixed The provided repro now runs (although we need to provide it with the correct number of ar

[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556714#comment-15556714 ] holdenk commented on SPARK-5981: I'm not sure porting the models to Python sounds like a g

[jira] [Commented] (SPARK-6174) Improve doc: Python ALS, MatrixFactorizationModel

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556720#comment-15556720 ] holdenk commented on SPARK-6174: I think Bryan did a good job of this I'd be in favour of

[jira] [Commented] (SPARK-7638) Python API for pmml.export

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556732#comment-15556732 ] holdenk commented on SPARK-7638: Do we still want to do this or focus on adding PMML expor

[jira] [Closed] (SPARK-7613) Serialization fails in pyspark for lambdas referencing class data members

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-7613. -- Resolution: Won't Fix I believe this is expected behaviour and the current best practice is simply to make a lo

[jira] [Closed] (SPARK-6780) Add saveAsTextFileByKey method for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-6780. -- Resolution: Won't Fix Since SPARK-3533 is WON'T FIX this one should be to. > Add saveAsTextFileByKey method for

[jira] [Commented] (SPARK-6831) Document how to use external data sources

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556758#comment-15556758 ] holdenk commented on SPARK-6831: Is this something we are planning to do at all? It doesn'

[jira] [Commented] (SPARK-8780) Move Python doctest code example from models to algorithms

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556763#comment-15556763 ] holdenk commented on SPARK-8780: Is this something we still want to do? This could be a gr

[jira] [Updated] (SPARK-8780) Move Python doctest code example from models to algorithms

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-8780: --- Labels: starter (was: ) > Move Python doctest code example from models to algorithms > --

[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556775#comment-15556775 ] holdenk commented on SPARK-7941: Are you still experiencing this issue [~cqnguyen] or woul

[jira] [Commented] (SPARK-7177) Create standard way to wrap Spark CLI scripts for external projects

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556780#comment-15556780 ] holdenk commented on SPARK-7177: I've run into similar challenges when working on Sparklin

[jira] [Commented] (SPARK-8605) Exclude files in StreamingContext. textFileStream(directory)

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556785#comment-15556785 ] holdenk commented on SPARK-8605: This is semi-documented (namely only atomic moves are sup

[jira] [Updated] (SPARK-8605) Exclude files in StreamingContext. textFileStream(directory)

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-8605: --- Component/s: (was: PySpark) Streaming > Exclude files in StreamingContext. textFileStream

[jira] [Closed] (SPARK-8719) Adding Python support for 1-sample, 2-sided Kolmogorov Smirnov Test

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8719. -- Resolution: Duplicate > Adding Python support for 1-sample, 2-sided Kolmogorov Smirnov Test > --

[jira] [Closed] (SPARK-8757) Check missing and add user guide for MLlib Python API

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8757. -- Resolution: Fixed All sub issues fixed, and well past 1.5 release. > Check missing and add user guide for MLlib

[jira] [Closed] (SPARK-8760) allow moving and symlinking binaries

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8760. -- Resolution: Fixed This is a "partially fixed" but I think fixed is a close enough description. We don't use rea

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557018#comment-15557018 ] holdenk commented on SPARK-9487: This will maybe break some tests in the process but it wo

[jira] [Updated] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-9487: --- Labels: starter (was: ) > Use the same num. worker threads in Scala/Python unit tests > -

[jira] [Commented] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557068#comment-15557068 ] holdenk commented on SPARK-10161: - I think this is an issue accross cluster modes, maybe

[jira] [Commented] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557069#comment-15557069 ] holdenk commented on SPARK-10161: - That being said - I'm not sure I see the value of this

[jira] [Commented] (SPARK-1762) Add functionality to pin RDDs in cache

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557133#comment-15557133 ] holdenk commented on SPARK-1762: Is this something we are still interested in? I could see

[jira] [Commented] (SPARK-1792) Missing Spark-Shell Configure Options

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557146#comment-15557146 ] holdenk commented on SPARK-1792: It feels like we've already got a pretty good mechanism f

[jira] [Commented] (SPARK-1865) Improve behavior of cleanup of disk state

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557149#comment-15557149 ] holdenk commented on SPARK-1865: So ALS specifically has a work around for this with clean

[jira] [Commented] (SPARK-2032) Add an RDD.samplePartitions method for partition-level sampling

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557153#comment-15557153 ] holdenk commented on SPARK-2032: I'm assuming since there hasn't been any activity for awh

[jira] [Commented] (SPARK-2722) Mechanism for escaping spark configs is not consistent

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557158#comment-15557158 ] holdenk commented on SPARK-2722: I think at this point trying to change the escaping of th

[jira] [Commented] (SPARK-3132) Avoid serialization for Array[Byte] in TorrentBroadcast

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557178#comment-15557178 ] holdenk commented on SPARK-3132: Is there any progress on this or would it be ok for me to

[jira] [Commented] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557182#comment-15557182 ] holdenk commented on SPARK-3312: I'm going to go ahead and close this, now that `Datasets`

[jira] [Closed] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-3312. -- Resolution: Won't Fix > Add a groupByKey which returns a special GroupBy object like in pandas > ---

[jira] [Commented] (SPARK-3348) Support user-defined SparkListeners properly

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557186#comment-15557186 ] holdenk commented on SPARK-3348: Is there still interest in seeing this happen? Should we

[jira] [Commented] (SPARK-3513) Provide a utility for running a function once on each executor

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557193#comment-15557193 ] holdenk commented on SPARK-3513: This seems closely related to SPARK-650 and SPARK-636 as

[jira] [Commented] (SPARK-3600) RDD[Double] doesn't use primitive arrays for caching

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557195#comment-15557195 ] holdenk commented on SPARK-3600: Is this something we still want to work on or does `Datas

[jira] [Commented] (SPARK-11571) Twitter Api for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557205#comment-15557205 ] holdenk commented on SPARK-11571: - Is there anything you are looking to do with this API?

[jira] [Closed] (SPARK-12774) DataFrame.mapPartitions apply function operates on Pandas DataFrame instead of a generator or rows

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-12774. --- Resolution: Won't Fix In some ways yes avoiding unecessary iteration can be good, but allowing Spark to spil

[jira] [Commented] (SPARK-11874) DistributedCache for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557217#comment-15557217 ] holdenk commented on SPARK-11874: - I think this is not intended to be supported, although

[jira] [Commented] (SPARK-12100) bug in spark/python/pyspark/rdd.py portable_hash()

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557219#comment-15557219 ] holdenk commented on SPARK-12100: - Just noting related progress in https://github.com/apa

[jira] [Commented] (SPARK-12776) Implement Python API for Datasets

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557224#comment-15557224 ] holdenk commented on SPARK-12776: - Just re-opening discussion here - the migration to dat

[jira] [Commented] (SPARK-11722) Rdds could be different between orginal one and save-out-then-read-in one

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557226#comment-15557226 ] holdenk commented on SPARK-11722: - Is this still an issue you are experiencing and if so

[jira] [Commented] (SPARK-13303) Spark fails with pandas import error when pandas is not explicitly imported by user

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557230#comment-15557230 ] holdenk commented on SPARK-13303: - What about if we added a requirements file? We have on

[jira] [Commented] (SPARK-13368) PySpark JavaModel fails to extract params from Spark side automatically

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557239#comment-15557239 ] holdenk commented on SPARK-13368: - It seems that we don't have this in the example anymor

[jira] [Closed] (SPARK-13368) PySpark JavaModel fails to extract params from Spark side automatically

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-13368. --- Resolution: Fixed > PySpark JavaModel fails to extract params from Spark side automatically > ---

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557242#comment-15557242 ] holdenk commented on SPARK-13534: - For people following along arrow is in the middle of v

[jira] [Commented] (SPARK-13606) Error from python worker: /usr/local/bin/python2.7: undefined symbol: _PyCodec_LookupTextEncoding

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557244#comment-15557244 ] holdenk commented on SPARK-13606: - Are you still experiencing this? > Error from python

[jira] [Commented] (SPARK-13585) addPyFile behavior change between 1.6 and before

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557248#comment-15557248 ] holdenk commented on SPARK-13585: - What is the use case for overwriting the old pyFile? T

[jira] [Closed] (SPARK-14229) PySpark DataFrame.rdd's can't be saved to an arbitrary Hadoop OutputFormat

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-14229. --- Resolution: Won't Fix I don't think this is really a bug - if you want to save from dataframes there is the

[jira] [Commented] (SPARK-20087) Include accumulators / taskMetrics when sending TaskKilled to onTaskEnd listeners

2018-02-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352980#comment-16352980 ] holdenk commented on SPARK-20087: - I've given up on changing the accumulator API until Sp

[jira] [Created] (SPARK-23672) Support returning lists in Arrow UDFs

2018-03-13 Thread holdenk (JIRA)
holdenk created SPARK-23672: --- Summary: Support returning lists in Arrow UDFs Key: SPARK-23672 URL: https://issues.apache.org/jira/browse/SPARK-23672 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-15009) PySpark CountVectorizerModel should be able to construct from vocabulary list

2018-03-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-15009: --- Assignee: Bryan Cutler > PySpark CountVectorizerModel should be able to construct from vocabulary li

[jira] [Resolved] (SPARK-15009) PySpark CountVectorizerModel should be able to construct from vocabulary list

2018-03-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-15009. - Resolution: Fixed Fix Version/s: 2.4.0 Target Version/s: 2.4.0 Thanks for fixing this Br

[jira] [Commented] (SPARK-23672) Support returning lists in Arrow UDFs

2018-03-19 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405574#comment-16405574 ] holdenk commented on SPARK-23672: - cc [~bryanc] for thoughts. > Support returning lists

[jira] [Resolved] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2018-03-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-21685. - Resolution: Fixed Fix Version/s: 2.4.0 > Params isSet in scala Transformer triggered by _setDefaul

[jira] [Assigned] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2018-03-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-21685: --- Assignee: Bryan Cutler > Params isSet in scala Transformer triggered by _setDefault in pyspark > ---

[jira] [Created] (SPARK-23783) Add new generic export trait for ML pipelines

2018-03-23 Thread holdenk (JIRA)
holdenk created SPARK-23783: --- Summary: Add new generic export trait for ML pipelines Key: SPARK-23783 URL: https://issues.apache.org/jira/browse/SPARK-23783 Project: Spark Issue Type: Sub-task

[jira] [Resolved] (SPARK-11239) PMML export for ML linear regression

2018-03-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-11239. - Resolution: Fixed Fix Version/s: 2.4.0 > PMML export for ML linear regression > --

[jira] [Assigned] (SPARK-11239) PMML export for ML linear regression

2018-03-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-11239: --- Assignee: holdenk > PMML export for ML linear regression > > >

[jira] [Resolved] (SPARK-23783) Add new generic export trait for ML pipelines

2018-03-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-23783. - Resolution: Fixed Fix Version/s: 2.4.0 > Add new generic export trait for ML pipelines > -

[jira] [Assigned] (SPARK-23783) Add new generic export trait for ML pipelines

2018-03-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-23783: --- Assignee: holdenk > Add new generic export trait for ML pipelines >

[jira] [Updated] (SPARK-23672) Support returning lists in Arrow UDFs

2018-03-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-23672: Description: Consider to add support for returning lists for individual inputs on non-grouped data inside o

[jira] [Updated] (SPARK-23672) Document Support returning lists in Arrow UDFs

2018-03-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-23672: Summary: Document Support returning lists in Arrow UDFs (was: Support returning lists in Arrow UDFs) > Do

[jira] [Updated] (SPARK-23672) Document Support returning lists in Arrow UDFs

2018-03-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-23672: Description: Documenting the support for returning lists for individual inputs on non-grouped data inside o

[jira] [Commented] (SPARK-22809) pyspark is sensitive to imports with dots

2018-03-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416038#comment-16416038 ] holdenk commented on SPARK-22809: - This _should_ be resolved by SPARK-23169 but I'll doub

[jira] [Created] (SPARK-23836) Support returning StructType & MapType in Arrow's "scalar" UDFS (or similar)

2018-03-30 Thread holdenk (JIRA)
holdenk created SPARK-23836: --- Summary: Support returning StructType & MapType in Arrow's "scalar" UDFS (or similar) Key: SPARK-23836 URL: https://issues.apache.org/jira/browse/SPARK-23836 Project: Spark

[jira] [Commented] (SPARK-23836) Support returning StructType & MapType in Arrow's "scalar" UDFS (or similar)

2018-03-30 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421044#comment-16421044 ] holdenk commented on SPARK-23836: - cc [~bryanc] > Support returning StructType & MapType

[jira] [Commented] (SPARK-23836) Support returning StructType & MapType in Arrow's "scalar" UDFS (or similar)

2018-04-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422922#comment-16422922 ] holdenk commented on SPARK-23836: - [~hyukjin.kwon] its a good question, that one seems to

[jira] [Commented] (SPARK-23836) Support returning StructType & MapType in Arrow's "scalar" UDFS (or similar)

2018-04-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422926#comment-16422926 ] holdenk commented on SPARK-23836: - I'm going to take a quick crack at this this week. >

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-04-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422938#comment-16422938 ] holdenk commented on SPARK-21187: - So Arrays are listed as crossed off but it seems like

[jira] [Commented] (SPARK-23836) Support returning StructType & MapType in Arrow's "scalar" UDFS (or similar)

2018-04-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422942#comment-16422942 ] holdenk commented on SPARK-23836: - Oh wait, I missunderstood our support of structype - I

[jira] [Updated] (SPARK-23836) Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

2018-04-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-23836: Summary: Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

[jira] [Commented] (SPARK-23851) Investigate pip install edit mode unicode errors

2018-04-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423081#comment-16423081 ] holdenk commented on SPARK-23851: - Happening with pip 9.0.3 > Investigate pip install ed

[jira] [Created] (SPARK-23851) Investigate pip install edit mode unicode errors

2018-04-02 Thread holdenk (JIRA)
holdenk created SPARK-23851: --- Summary: Investigate pip install edit mode unicode errors Key: SPARK-23851 URL: https://issues.apache.org/jira/browse/SPARK-23851 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-23853) Skip doctests which require hive support built in PySpark

2018-04-02 Thread holdenk (JIRA)
holdenk created SPARK-23853: --- Summary: Skip doctests which require hive support built in PySpark Key: SPARK-23853 URL: https://issues.apache.org/jira/browse/SPARK-23853 Project: Spark Issue Type: B

[jira] [Updated] (SPARK-26343) Speed up running the kubernetes integration tests locally

2018-12-11 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-26343: Summary: Speed up running the kubernetes integration tests locally (was: Running the kubernetes ) > Spee

[jira] [Created] (SPARK-26343) Running the kubernetes

2018-12-11 Thread holdenk (JIRA)
holdenk created SPARK-26343: --- Summary: Running the kubernetes Key: SPARK-26343 URL: https://issues.apache.org/jira/browse/SPARK-26343 Project: Spark Issue Type: Improvement Components: K

[jira] [Created] (SPARK-26497) Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the image build script.

2018-12-28 Thread holdenk (JIRA)
holdenk created SPARK-26497: --- Summary: Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the image build script. Key: SPARK-26497 URL: https://issues.apache.org/jira/browse/SPARK-26497

[jira] [Assigned] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering

2019-01-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-24489: --- Assignee: shahid > No check for invalid input type of weight data in ml.PowerIterationClustering >

[jira] [Resolved] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering

2019-01-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-24489. - Resolution: Fixed Fix Version/s: 3.0.0 Thank's for working on this, I've merged the fix into mast

[jira] [Created] (SPARK-12428) Write a script to run all PySpark MLlib examples for testing

2015-12-18 Thread holdenk (JIRA)
holdenk created SPARK-12428: --- Summary: Write a script to run all PySpark MLlib examples for testing Key: SPARK-12428 URL: https://issues.apache.org/jira/browse/SPARK-12428 Project: Spark Issue Typ

[jira] [Commented] (SPARK-12428) Write a script to run all PySpark MLlib examples for testing

2015-12-18 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064610#comment-15064610 ] holdenk commented on SPARK-12428: - I can start working on this a bit over the holidays :)

[jira] [Created] (SPARK-12432) Make parts of the Spark testing API public to assist developers making their own tests.

2015-12-18 Thread holdenk (JIRA)
holdenk created SPARK-12432: --- Summary: Make parts of the Spark testing API public to assist developers making their own tests. Key: SPARK-12432 URL: https://issues.apache.org/jira/browse/SPARK-12432 Project

[jira] [Created] (SPARK-12433) Make parts of the core Spark testing API public to assist developers making their own tests.

2015-12-18 Thread holdenk (JIRA)
holdenk created SPARK-12433: --- Summary: Make parts of the core Spark testing API public to assist developers making their own tests. Key: SPARK-12433 URL: https://issues.apache.org/jira/browse/SPARK-12433 Pr

[jira] [Created] (SPARK-12434) Make parts of the streaming testing API public to assist developers making their own tests.

2015-12-18 Thread holdenk (JIRA)
holdenk created SPARK-12434: --- Summary: Make parts of the streaming testing API public to assist developers making their own tests. Key: SPARK-12434 URL: https://issues.apache.org/jira/browse/SPARK-12434 Pro

[jira] [Created] (SPARK-12469) Consistent Accumulators for Spark

2015-12-21 Thread holdenk (JIRA)
holdenk created SPARK-12469: --- Summary: Consistent Accumulators for Spark Key: SPARK-12469 URL: https://issues.apache.org/jira/browse/SPARK-12469 Project: Spark Issue Type: Improvement Com

[jira] [Commented] (SPARK-12469) Consistent Accumulators for Spark

2015-12-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072987#comment-15072987 ] holdenk commented on SPARK-12469: - Thanks for the link to the previous attempt :) I think

[jira] [Created] (SPARK-12587) Make parts of the Spark SQL testing API public

2015-12-30 Thread holdenk (JIRA)
holdenk created SPARK-12587: --- Summary: Make parts of the Spark SQL testing API public Key: SPARK-12587 URL: https://issues.apache.org/jira/browse/SPARK-12587 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12611) test_infer_schema_to_local depended on old handling of missing value in row

2016-01-03 Thread holdenk (JIRA)
holdenk created SPARK-12611: --- Summary: test_infer_schema_to_local depended on old handling of missing value in row Key: SPARK-12611 URL: https://issues.apache.org/jira/browse/SPARK-12611 Project: Spark

[jira] [Created] (SPARK-12731) PySpark docstring cleanup

2016-01-08 Thread holdenk (JIRA)
holdenk created SPARK-12731: --- Summary: PySpark docstring cleanup Key: SPARK-12731 URL: https://issues.apache.org/jira/browse/SPARK-12731 Project: Spark Issue Type: Improvement Components:

[jira] [Commented] (SPARK-12731) PySpark docstring cleanup

2016-01-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090280#comment-15090280 ] holdenk commented on SPARK-12731: - I've got a shell script started to do this I'll just t

[jira] [Commented] (SPARK-12731) PySpark docstring cleanup

2016-01-09 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090507#comment-15090507 ] holdenk commented on SPARK-12731: - cc [~josephkb] based on our chat on my other PR > PyS

[jira] [Commented] (SPARK-12731) PySpark docstring cleanup

2016-01-21 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111301#comment-15111301 ] holdenk commented on SPARK-12731: - So is this a thing we should consider pursuing or mayb

[jira] [Commented] (SPARK-12684) Matrix.toString should take a format for how each cell should be printed

2016-01-21 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111304#comment-15111304 ] holdenk commented on SPARK-12684: - So I think this issue is probably related to https://

[jira] [Commented] (SPARK-10498) Add requirements file for create dev python tools

2016-01-21 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111306#comment-15111306 ] holdenk commented on SPARK-10498: - Sounds good - I'll give this a shot > Add requirement

[jira] [Closed] (SPARK-12151) Improve PySpark MLLib prediction performance when using pickled vectors

2016-01-22 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-12151. --- Resolution: Not A Problem Checked the models, all of the ones not doing these were doing there prediction in

[jira] [Commented] (SPARK-12986) Fix pydoc warnings in mllib/regression.py

2016-01-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116085#comment-15116085 ] holdenk commented on SPARK-12986: - Do you think we should maybe add make html to the lint

[jira] [Commented] (SPARK-12684) Matrix.toString should take a format for how each cell should be printed

2016-01-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118311#comment-15118311 ] holdenk commented on SPARK-12684: - [~chris.roberts] would the toString with maxLineWidth

[jira] [Created] (SPARK-13025) Allow user to specify the initial model when training LogisticRegression

2016-01-26 Thread holdenk (JIRA)
holdenk created SPARK-13025: --- Summary: Allow user to specify the initial model when training LogisticRegression Key: SPARK-13025 URL: https://issues.apache.org/jira/browse/SPARK-13025 Project: Spark

<    2   3   4   5   6   7   8   9   10   >