[jira] [Reopened] (SPARK-20765) Cannot load persisted PySpark ML Pipeline that includes 3rd party stage (Transformer or Estimator) if the package name of stage is not "org.apache.spark" and "pyspark"

2019-09-08 Thread Ilya Matiach (Jira)
[ https://issues.apache.org/jira/browse/SPARK-20765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Matiach reopened SPARK-20765: -- I'm still seeing this issue, please see this thread: [https://github.com/Azure/mmlspark/issues/61

[jira] [Created] (SPARK-26498) Integrate barrier execution with MMLSpark's LightGBM

2018-12-28 Thread Ilya Matiach (JIRA)
Ilya Matiach created SPARK-26498: Summary: Integrate barrier execution with MMLSpark's LightGBM Key: SPARK-26498 URL: https://issues.apache.org/jira/browse/SPARK-26498 Project: Spark Issue Ty

[jira] [Commented] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage

2018-12-14 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721841#comment-16721841 ] Ilya Matiach commented on SPARK-24942: -- Would really like to see this resolved.  It

[jira] [Commented] (SPARK-24103) BinaryClassificationEvaluator should use sample weight data

2018-05-14 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474754#comment-16474754 ] Ilya Matiach commented on SPARK-24103: -- This issue is fixed by this PR: https://git

[jira] [Updated] (SPARK-24101) MulticlassClassificationEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Matiach updated SPARK-24101: - Description: The LogisticRegression and LinearRegression models support training with a weight co

[jira] [Updated] (SPARK-24102) RegressionEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Matiach updated SPARK-24102: - Description: The LogisticRegression and LinearRegression models support training with a weight co

[jira] [Updated] (SPARK-24103) BinaryClassificationEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Matiach updated SPARK-24103: - Description: The LogisticRegression and LinearRegression models support training with a weight co

[jira] [Commented] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454435#comment-16454435 ] Ilya Matiach commented on SPARK-18693: -- [~josephkb] sure, I've added 3 JIRAs for tra

[jira] [Created] (SPARK-24103) BinaryClassificationEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
Ilya Matiach created SPARK-24103: Summary: BinaryClassificationEvaluator should use sample weight data Key: SPARK-24103 URL: https://issues.apache.org/jira/browse/SPARK-24103 Project: Spark

[jira] [Updated] (SPARK-24102) RegressionEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Matiach updated SPARK-24102: - Issue Type: Improvement (was: Bug) > RegressionEvaluator should use sample weight data > ---

[jira] [Updated] (SPARK-24101) MulticlassClassificationEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Matiach updated SPARK-24101: - Issue Type: Improvement (was: Bug) > MulticlassClassificationEvaluator should use sample weight

[jira] [Created] (SPARK-24102) RegressionEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
Ilya Matiach created SPARK-24102: Summary: RegressionEvaluator should use sample weight data Key: SPARK-24102 URL: https://issues.apache.org/jira/browse/SPARK-24102 Project: Spark Issue Type:

[jira] [Created] (SPARK-24101) MulticlassClassificationEvaluator should use sample weight data

2018-04-26 Thread Ilya Matiach (JIRA)
Ilya Matiach created SPARK-24101: Summary: MulticlassClassificationEvaluator should use sample weight data Key: SPARK-24101 URL: https://issues.apache.org/jira/browse/SPARK-24101 Project: Spark

[jira] [Commented] (SPARK-18755) Add Randomized Grid Search to Spark ML

2017-10-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221687#comment-16221687 ] Ilya Matiach commented on SPARK-18755: -- I've created a PR that adds randomized grid

[jira] [Commented] (SPARK-22357) SparkContext.binaryFiles ignore minPartitions parameter

2017-10-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221388#comment-16221388 ] Ilya Matiach commented on SPARK-22357: -- binaryFiles ignores the number of partitions

[jira] [Commented] (SPARK-18755) Add Randomized Grid Search to Spark ML

2017-10-06 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195145#comment-16195145 ] Ilya Matiach commented on SPARK-18755: -- This is a very interesting issue. I am thin

[jira] [Comment Edited] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-10-05 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193691#comment-16193691 ] Ilya Matiach edited comment on SPARK-21742 at 10/5/17 9:12 PM:

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-10-05 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193691#comment-16193691 ] Ilya Matiach commented on SPARK-21742: -- [~podongfeng] The test was just validating t

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2017-10-05 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193675#comment-16193675 ] Ilya Matiach commented on SPARK-16473: -- [~podongfeng] interesting - it looks like th

[jira] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-01-30 Thread Ilya Matiach (JIRA)
Title: Message Title Ilya Matiach commented on SPARK-19208

[jira] [Commented] (SPARK-17975) EMLDAOptimizer fails with ClassCastException on YARN

2017-01-26 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840151#comment-15840151 ] Ilya Matiach commented on SPARK-17975: -- [~josephkb] I was able to verify that this i

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-17 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826354#comment-15826354 ] Ilya Matiach commented on SPARK-19208: -- [~srowen] Good point, with something like ha

[jira] [Commented] (SPARK-11520) RegressionMetrics should support instance weights

2017-01-13 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822556#comment-15822556 ] Ilya Matiach commented on SPARK-11520: -- I've sent a pull request that includes this

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-13 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822549#comment-15822549 ] Ilya Matiach commented on SPARK-19208: -- [~srowen] isn't feature hashing (eg HashingT

[jira] [Comment Edited] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion

2017-01-09 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812367#comment-15812367 ] Ilya Matiach edited comment on SPARK-19053 at 1/9/17 6:09 PM: -

[jira] [Commented] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion

2017-01-09 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812367#comment-15812367 ] Ilya Matiach commented on SPARK-19053: -- The only problem with the change proposed is

[jira] [Commented] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion

2017-01-09 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812352#comment-15812352 ] Ilya Matiach commented on SPARK-19053: -- I like your second api (setMetrics). That w

[jira] [Commented] (SPARK-17975) EMLDAOptimizer fails with ClassCastException on YARN

2017-01-06 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806368#comment-15806368 ] Ilya Matiach commented on SPARK-17975: -- I was able to reproduce the issue based on y

[jira] [Commented] (SPARK-6099) Stabilize mllib ClassificationModel, RegressionModel APIs

2017-01-06 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805232#comment-15805232 ] Ilya Matiach commented on SPARK-6099: - It doesn't look like the API's are experimental

[jira] [Commented] (SPARK-11968) ALS recommend all methods spend most of time in GC

2017-01-06 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805169#comment-15805169 ] Ilya Matiach commented on SPARK-11968: -- Can someone with permissions change the stat

[jira] [Commented] (SPARK-17975) EMLDAOptimizer fails with ClassCastException on YARN

2017-01-05 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803673#comment-15803673 ] Ilya Matiach commented on SPARK-17975: -- Thank you for sending the dataset, I'm worki

[jira] [Commented] (SPARK-11569) StringIndexer transform fails when column contains nulls

2017-01-05 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803656#comment-15803656 ] Ilya Matiach commented on SPARK-11569: -- @jliwork @srowen are you currently working o

[jira] [Commented] (SPARK-14975) Predicted Probability per training instance for Gradient Boosted Trees in mllib.

2016-12-30 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787910#comment-15787910 ] Ilya Matiach commented on SPARK-14975: -- I can take a look into this issue. It looks

[jira] [Commented] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2016-12-29 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786013#comment-15786013 ] Ilya Matiach commented on SPARK-18693: -- Many classifiers in ml don't seem to support

[jira] [Commented] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785558#comment-15785558 ] Ilya Matiach commented on SPARK-18698: -- This looks like a minor bug... similar trans

[jira] [Commented] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2016-12-29 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785469#comment-15785469 ] Ilya Matiach commented on SPARK-18693: -- I can take a look into fixing this issue. >

[jira] [Commented] (SPARK-18054) Unexpected error from UDF that gets an element of a vector: argument 1 requires vector type, however, '`_column_`' is of vector type

2016-12-22 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770991#comment-15770991 ] Ilya Matiach commented on SPARK-18054: -- It looks like I can still repro the error wi

[jira] [Commented] (SPARK-18054) Unexpected error from UDF that gets an element of a vector: argument 1 requires vector type, however, '`_column_`' is of vector type

2016-12-22 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770981#comment-15770981 ] Ilya Matiach commented on SPARK-18054: -- Actually, that error message above looks dif

[jira] [Commented] (SPARK-18054) Unexpected error from UDF that gets an element of a vector: argument 1 requires vector type, however, '`_column_`' is of vector type

2016-12-22 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770972#comment-15770972 ] Ilya Matiach commented on SPARK-18054: -- It looks like this is already fixed in the l

[jira] [Commented] (SPARK-18054) Unexpected error from UDF that gets an element of a vector: argument 1 requires vector type, however, '`_column_`' is of vector type

2016-12-22 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770690#comment-15770690 ] Ilya Matiach commented on SPARK-18054: -- I can try to repro this and add in a better

[jira] [Commented] (SPARK-17801) [ML]Random Forest Regression fails for large input

2016-12-22 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770463#comment-15770463 ] Ilya Matiach commented on SPARK-17801: -- Taking a look into the error > [ML]Random F

[jira] [Commented] (SPARK-17975) EMLDAOptimizer fails with ClassCastException on YARN

2016-12-22 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770446#comment-15770446 ] Ilya Matiach commented on SPARK-17975: -- Could you send a link to the repro dataset?

[jira] [Commented] (SPARK-18036) Decision Trees do not handle edge cases

2016-12-21 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768372#comment-15768372 ] Ilya Matiach commented on SPARK-18036: -- Thanks, I've sent a pull request to fix this

[jira] [Commented] (SPARK-18301) VectorAssembler does not support StructTypes

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765611#comment-15765611 ] Ilya Matiach commented on SPARK-18301: -- For example, when I use HashingTF I get the

[jira] [Commented] (SPARK-18301) VectorAssembler does not support StructTypes

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765596#comment-15765596 ] Ilya Matiach commented on SPARK-18301: -- I am able to reproduce this, but I'm not sur

[jira] [Commented] (SPARK-12965) Indexer setInputCol() doesn't resolve column names like DataFrame.col()

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765124#comment-15765124 ] Ilya Matiach commented on SPARK-12965: -- Can the ML component be removed from this Ji

[jira] [Commented] (SPARK-18036) Decision Trees do not handle edge cases

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765032#comment-15765032 ] Ilya Matiach commented on SPARK-18036: -- Weichen Xu, are you working on this issue or

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765027#comment-15765027 ] Ilya Matiach commented on SPARK-16473: -- Do you have a smaller dataset than the one i

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764964#comment-15764964 ] Ilya Matiach commented on SPARK-16473: -- If you could put the sample dataset on googl

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764909#comment-15764909 ] Ilya Matiach commented on SPARK-16473: -- I've added a pull request here: https://gith

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764892#comment-15764892 ] Ilya Matiach commented on SPARK-16473: -- I will start a pull request for the change.

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-19 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762532#comment-15762532 ] Ilya Matiach commented on SPARK-16473: -- I'm interested in looking into this issue.