[jira] [Commented] (SPARK-11215) Add multiple columns support to StringIndexer

2018-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591667#comment-16591667 ] Barry Becker commented on SPARK-11215: -- Is the main motivation for this feature per

[jira] [Commented] (SPARK-9610) Class and instance weighting for ML

2018-08-15 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581253#comment-16581253 ] Barry Becker commented on SPARK-9610: - All ML models should support having and option

[jira] [Commented] (SPARK-21986) QuantileDiscretizer picks wrong split point for data with lots of 0's

2018-08-03 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568718#comment-16568718 ] Barry Becker commented on SPARK-21986: -- Here are a couple more test cases that show

[jira] [Created] (SPARK-24394) Nodes in decision tree sometimes have negative impurity values

2018-05-25 Thread Barry Becker (JIRA)
Barry Becker created SPARK-24394: Summary: Nodes in decision tree sometimes have negative impurity values Key: SPARK-24394 URL: https://issues.apache.org/jira/browse/SPARK-24394 Project: Spark

[jira] [Commented] (SPARK-24019) AnalysisException for Window function expression to compute derivative

2018-04-19 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444202#comment-16444202 ] Barry Becker commented on SPARK-24019: -- Lowering to minor because I found a way to s

[jira] [Comment Edited] (SPARK-24019) AnalysisException for Window function expression to compute derivative

2018-04-19 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444202#comment-16444202 ] Barry Becker edited comment on SPARK-24019 at 4/19/18 3:07 PM:

[jira] [Updated] (SPARK-24019) AnalysisException for Window function expression to compute derivative

2018-04-19 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-24019: - Priority: Minor (was: Major) > AnalysisException for Window function expression to compute deriv

[jira] [Created] (SPARK-24019) AnalysisException for Window function expression to compute derivative

2018-04-18 Thread Barry Becker (JIRA)
Barry Becker created SPARK-24019: Summary: AnalysisException for Window function expression to compute derivative Key: SPARK-24019 URL: https://issues.apache.org/jira/browse/SPARK-24019 Project: Spark

[jira] [Created] (SPARK-23824) Make inpurityStats publicly accessible in ml.tree.Node

2018-03-29 Thread Barry Becker (JIRA)
Barry Becker created SPARK-23824: Summary: Make inpurityStats publicly accessible in ml.tree.Node Key: SPARK-23824 URL: https://issues.apache.org/jira/browse/SPARK-23824 Project: Spark Issue

[jira] [Commented] (SPARK-6162) Handle missing values in GBM

2018-03-27 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415600#comment-16415600 ] Barry Becker commented on SPARK-6162: - If we all agree that is is something that would

[jira] [Commented] (SPARK-8529) Set metadata for MinMaxScaler

2018-02-19 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369194#comment-16369194 ] Barry Becker commented on SPARK-8529: - Complementing the output metadata in what way?

[jira] [Comment Edited] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-11-07 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064972#comment-16064972 ] Barry Becker edited comment on SPARK-20226 at 11/7/17 6:09 PM:

[jira] [Commented] (SPARK-9610) Class and instance weighting for ML

2017-10-25 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219005#comment-16219005 ] Barry Becker commented on SPARK-9610: - Frequent item sets (associations) could use it

[jira] [Commented] (SPARK-7276) withColumn is very slow on dataframe with large number of columns

2017-09-15 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167814#comment-16167814 ] Barry Becker commented on SPARK-7276: - Isn't there still a problem with withColumn per

[jira] [Commented] (SPARK-21986) QuantileDiscretizer picks wrong split point for data with lots of 0's

2017-09-12 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163768#comment-16163768 ] Barry Becker commented on SPARK-21986: -- But wait, the dataset I discovered the probl

[jira] [Created] (SPARK-21986) QuantileDiscretizer picks wrong split point for data with lots of 0's

2017-09-12 Thread Barry Becker (JIRA)
Barry Becker created SPARK-21986: Summary: QuantileDiscretizer picks wrong split point for data with lots of 0's Key: SPARK-21986 URL: https://issues.apache.org/jira/browse/SPARK-21986 Project: Spark

[jira] [Commented] (SPARK-14155) Hide UserDefinedType in Spark 2.0

2017-09-06 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155790#comment-16155790 ] Barry Becker commented on SPARK-14155: -- Does it work with datasets now in 2.1? > Hi

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-06-27 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064972#comment-16064972 ] Barry Becker commented on SPARK-20226: -- Calling cache() on the dataframe on the afte

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2017-05-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018840#comment-16018840 ] Barry Becker commented on SPARK-16845: -- I checked out the the v2.1.1 tag of spark fr

[jira] [Commented] (SPARK-20542) Add an API into Bucketizer that can bin a lot of columns all at once

2017-05-10 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004688#comment-16004688 ] Barry Becker commented on SPARK-20542: -- @viirya, your implementation of MultipleBuck

[jira] [Commented] (SPARK-20542) Add an API into Bucketizer that can bin a lot of columns all at once

2017-05-09 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002609#comment-16002609 ] Barry Becker commented on SPARK-20542: -- This is a great improvement, @viirya! Accord

[jira] [Commented] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

2017-05-09 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002582#comment-16002582 ] Barry Becker commented on SPARK-19581: -- I think its just a matter of sending a featu

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-05-08 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001388#comment-16001388 ] Barry Becker commented on SPARK-13747: -- Good to hear that your workaround was succes

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-05-08 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001303#comment-16001303 ] Barry Becker commented on SPARK-13747: -- @saif1988, just to clarify, did you add the

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-05-08 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001186#comment-16001186 ] Barry Becker commented on SPARK-13747: -- I also tried the "thread-pool-executor" work

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-05-08 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000830#comment-16000830 ] Barry Becker commented on SPARK-13747: -- There seems to be some related discussion he

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-27 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987706#comment-15987706 ] Barry Becker commented on SPARK-20392: -- Thanks for working on a fix. Do you have any

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20392: - Attachment: model_9756.zip blockbuster_fewCols.csv attaching blockbuster_fewCols.

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981386#comment-15981386 ] Barry Becker commented on SPARK-20392: -- [~viirya] that is correct. If I reduce the d

[jira] [Comment Edited] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979049#comment-15979049 ] Barry Becker edited comment on SPARK-20392 at 4/21/17 4:49 PM:

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20392: - Attachment: model_9754.zip Attaching the parquet pipeline (as zip). > Slow performance when call

[jira] [Comment Edited] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979049#comment-15979049 ] Barry Becker edited comment on SPARK-20392 at 4/21/17 4:46 PM:

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979049#comment-15979049 ] Barry Becker commented on SPARK-20392: -- Yes [~kiszk], I was able to create a simple

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-19 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20392: - Attachment: giant_query_plan_for_fitting_pipeline.txt Giant nested query plan using when calling

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-19 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20392: - Attachment: blockbuster.csv Attaching blockbuster.csv data file with many columns, but few rows.

[jira] [Created] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-19 Thread Barry Becker (JIRA)
Barry Becker created SPARK-20392: Summary: Slow performance when calling fit on ML pipeline for dataset with many columns but few rows Key: SPARK-20392 URL: https://issues.apache.org/jira/browse/SPARK-20392

[jira] [Commented] (SPARK-6509) MDLP discretizer

2017-04-18 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972744#comment-15972744 ] Barry Becker commented on SPARK-6509: - As further proof of relevance, I will be giving

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-07 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960868#comment-15960868 ] Barry Becker commented on SPARK-20226: -- Only 11 columns. I did not want to wait for

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-07 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960806#comment-15960806 ] Barry Becker commented on SPARK-20226: -- OK, I set the flag using sqlContext.setConf

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-06 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959134#comment-15959134 ] Barry Becker commented on SPARK-20226: -- Yes. We are running through spark job-server

[jira] [Comment Edited] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-06 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959024#comment-15959024 ] Barry Becker edited comment on SPARK-20226 at 4/6/17 2:45 PM: -

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-06 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959024#comment-15959024 ] Barry Becker commented on SPARK-20226: -- I set spark.sql.constraintPropagation.enable

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Attachment: profile_indexer2.PNG A snapshot of the hotspot sampler from JVisualVM while cacheTabl

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957732#comment-15957732 ] Barry Becker commented on SPARK-20226: -- I did some profiling using the sampler in JV

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957489#comment-15957489 ] Barry Becker commented on SPARK-20226: -- I thought the problem was in the cacheTable

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957457#comment-15957457 ] Barry Becker commented on SPARK-20226: -- It seems like it has to do with the interact

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Labels: cache (was: ) > Call to sqlContext.cacheTable takes an incredibly long time in some case

[jira] [Comment Edited] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957296#comment-15957296 ] Barry Becker edited comment on SPARK-20226 at 4/5/17 5:36 PM: -

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957296#comment-15957296 ] Barry Becker commented on SPARK-20226: -- We noticed that this is reproducible just by

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Attachment: xyzzy.csv Attaching the datafile, but I don't think it is significant. This problem c

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Description: I have a case where the call to sqlContext.cacheTable can take an arbitrarily long

[jira] [Created] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
Barry Becker created SPARK-20226: Summary: Call to sqlContext.cacheTable takes an incredibly long time in some cases Key: SPARK-20226 URL: https://issues.apache.org/jira/browse/SPARK-20226 Project: Sp

[jira] [Commented] (SPARK-20071) StringIndexer overflows Kryo serialization buffer when run on column with many long distinct values

2017-03-23 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938475#comment-15938475 ] Barry Becker commented on SPARK-20071: -- Yes. I agree. I wanted to report the issue,

[jira] [Created] (SPARK-20071) StringIndexer overflows Kryo serialization buffer when run on column with many long distinct values

2017-03-23 Thread Barry Becker (JIRA)
Barry Becker created SPARK-20071: Summary: StringIndexer overflows Kryo serialization buffer when run on column with many long distinct values Key: SPARK-20071 URL: https://issues.apache.org/jira/browse/SPARK-2007

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-03-22 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936997#comment-15936997 ] Barry Becker commented on SPARK-13747: -- We have hit this on rare instances in our pr

[jira] [Created] (SPARK-19699) createOrReplaceTable does not always replace an existing table of the same name

2017-02-22 Thread Barry Becker (JIRA)
Barry Becker created SPARK-19699: Summary: createOrReplaceTable does not always replace an existing table of the same name Key: SPARK-19699 URL: https://issues.apache.org/jira/browse/SPARK-19699 Proje

[jira] [Commented] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

2017-02-13 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863819#comment-15863819 ] Barry Becker commented on SPARK-19581: -- I agree with minor prioritization, since the

[jira] [Created] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

2017-02-13 Thread Barry Becker (JIRA)
Barry Becker created SPARK-19581: Summary: running NaiveBayes model with 0 features can crash the executor with D rorreGEMV Key: SPARK-19581 URL: https://issues.apache.org/jira/browse/SPARK-19581 Proj

[jira] [Commented] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2017-01-25 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838617#comment-15838617 ] Barry Becker commented on SPARK-4049: - I read the comments, but I'm still not really s

[jira] [Commented] (SPARK-19317) UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized

2017-01-23 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834790#comment-15834790 ] Barry Becker commented on SPARK-19317: -- I figured out a workaround for this problem.

[jira] [Updated] (SPARK-19317) UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized

2017-01-23 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-19317: - Priority: Minor (was: Major) > UnsupportedOperationException: empty.reduceLeft in LinearSeqOptim

[jira] [Updated] (SPARK-19317) UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized

2017-01-23 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-19317: - Description: I wish I had more of a simple reproducible case to give, but I got the below except

[jira] [Commented] (SPARK-19317) UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized

2017-01-23 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834700#comment-15834700 ] Barry Becker commented on SPARK-19317: -- As far as I can tell, this only occurs when

[jira] [Updated] (SPARK-19317) UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized

2017-01-23 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-19317: - Description: I wish I had more of a simple reproducible case to give, but I got the below except

[jira] [Created] (SPARK-19317) UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized

2017-01-20 Thread Barry Becker (JIRA)
Barry Becker created SPARK-19317: Summary: UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized Key: SPARK-19317 URL: https://issues.apache.org/jira/browse/SPARK-19317 Project: Spark

[jira] [Created] (SPARK-19245) Cannot build spark-assembly jar

2017-01-16 Thread Barry Becker (JIRA)
Barry Becker created SPARK-19245: Summary: Cannot build spark-assembly jar Key: SPARK-19245 URL: https://issues.apache.org/jira/browse/SPARK-19245 Project: Spark Issue Type: Documentation

[jira] [Comment Edited] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-12-20 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762805#comment-15762805 ] Barry Becker edited comment on SPARK-16845 at 12/20/16 9:24 PM: ---

[jira] [Commented] (SPARK-11293) ExternalSorter and ExternalAppendOnlyMap should free shuffle memory in their stop() methods

2016-12-20 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765096#comment-15765096 ] Barry Becker commented on SPARK-11293: -- Not sure if this is related, but I am runnin

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-12-19 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762805#comment-15762805 ] Barry Becker commented on SPARK-16845: -- I found a workaround that allows me to avoid

[jira] [Commented] (SPARK-11215) Add multiple columns support to StringIndexer

2016-12-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722696#comment-15722696 ] Barry Becker commented on SPARK-11215: -- This would be a good feature. It might be ni

[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)

2016-11-29 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706389#comment-15706389 ] Barry Becker commented on SPARK-18502: -- Is there a way to escape the backtick when i

[jira] [Comment Edited] (SPARK-13913) DataFrame.withColumn fails when trying to replace existing column with dot in name

2016-11-18 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15677144#comment-15677144 ] Barry Becker edited comment on SPARK-13913 at 11/18/16 5:02 PM: ---

[jira] [Commented] (SPARK-13913) DataFrame.withColumn fails when trying to replace existing column with dot in name

2016-11-18 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15677144#comment-15677144 ] Barry Becker commented on SPARK-13913: -- I can still reproduce this using spark 1.6.3

[jira] [Created] (SPARK-18502) Spark does not handle columns that contain backquote (`)

2016-11-18 Thread Barry Becker (JIRA)
Barry Becker created SPARK-18502: Summary: Spark does not handle columns that contain backquote (`) Key: SPARK-18502 URL: https://issues.apache.org/jira/browse/SPARK-18502 Project: Spark Issu

[jira] [Commented] (SPARK-11977) Support accessing a DataFrame column using its name without backticks if the name contains '.'

2016-11-18 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15676856#comment-15676856 ] Barry Becker commented on SPARK-11977: -- I would also like to know how to handle colu

[jira] [Commented] (SPARK-12965) Indexer setInputCol() doesn't resolve column names like DataFrame.col()

2016-11-17 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674129#comment-15674129 ] Barry Becker commented on SPARK-12965: -- This is a big issue for us because we don't

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-11-14 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664866#comment-15664866 ] Barry Becker commented on SPARK-16845: -- I am encountering a similar exception in spa

[jira] [Commented] (SPARK-14138) Generated SpecificColumnarIterator code can exceed JVM size limit for cached DataFrames

2016-11-14 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664842#comment-15664842 ] Barry Becker commented on SPARK-14138: -- I am using spark 1.6.3 on a DataFrame with 2

[jira] [Commented] (SPARK-8443) GenerateMutableProjection Exceeds JVM Code Size Limits

2016-11-14 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664779#comment-15664779 ] Barry Becker commented on SPARK-8443: - I see the same error in spark 1.6.3. Is there a

[jira] [Commented] (SPARK-18181) Huge managed memory leak (2.7G) when running reduceByKey

2016-10-31 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623018#comment-15623018 ] Barry Becker commented on SPARK-18181: -- For this case to leak a lot of memory, I bin

[jira] [Created] (SPARK-18181) Huge managed memory leak (2.7G) when running reduceByKey

2016-10-31 Thread Barry Becker (JIRA)
Barry Becker created SPARK-18181: Summary: Huge managed memory leak (2.7G) when running reduceByKey Key: SPARK-18181 URL: https://issues.apache.org/jira/browse/SPARK-18181 Project: Spark Issu

[jira] [Commented] (SPARK-14363) Executor OOM due to a memory leak in Sorter

2016-10-31 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622975#comment-15622975 ] Barry Becker commented on SPARK-14363: -- I am hitting this issue in 1.6.2. In fact, I

[jira] [Commented] (SPARK-18054) Unexpected error from UDF that gets an element of a vector: argument 1 requires vector type, however, '`_column_`' is of vector type

2016-10-22 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15597816#comment-15597816 ] Barry Becker commented on SPARK-18054: -- Ah. That is quite likely the problem. I will

[jira] [Commented] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-10-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596382#comment-15596382 ] Barry Becker commented on SPARK-16216: -- Yes, That worked. Thanks for the workaround!

[jira] [Created] (SPARK-18054) Unexpected error from UDF that gets an element of a vector: argument 1 requires vector type, however, '`_column_`' is of vector type

2016-10-21 Thread Barry Becker (JIRA)
Barry Becker created SPARK-18054: Summary: Unexpected error from UDF that gets an element of a vector: argument 1 requires vector type, however, '`_column_`' is of vector type Key: SPARK-18054 URL: https://issues.

[jira] [Commented] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-10-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595619#comment-15595619 ] Barry Becker commented on SPARK-16216: -- If timezone is not specified, the date shoul

[jira] [Comment Edited] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-10-21 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595619#comment-15595619 ] Barry Becker edited comment on SPARK-16216 at 10/21/16 4:41 PM: ---

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1968#comment-1968 ] Barry Becker commented on SPARK-17219: -- I'll make another attempt to clarify my use

[jira] [Commented] (SPARK-14234) Executor crashes for TaskRunner thread interruption

2016-08-31 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453414#comment-15453414 ] Barry Becker commented on SPARK-14234: -- Is it a lot of work to backport this fix 1.6

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436819#comment-15436819 ] Barry Becker commented on SPARK-17219: -- In my opinion, yes. It is something that app

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436767#comment-15436767 ] Barry Becker commented on SPARK-17219: -- If you support the different strategies as R

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435651#comment-15435651 ] Barry Becker commented on SPARK-17219: -- If the decision is to have an additional nul

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435484#comment-15435484 ] Barry Becker commented on SPARK-17219: -- Nulls were not accepted in the column. I had

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435403#comment-15435403 ] Barry Becker commented on SPARK-17219: -- There needs to be some way to handle null va

[jira] [Created] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
Barry Becker created SPARK-17219: Summary: QuantileDiscretizer does strange things with NaN values Key: SPARK-17219 URL: https://issues.apache.org/jira/browse/SPARK-17219 Project: Spark Issue

[jira] [Updated] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-17086: - Attachment: titanic.csv > QuantileDiscretizer throws InvalidArgumentException (parameter splits g

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435236#comment-15435236 ] Barry Becker commented on SPARK-17086: -- Thanks. BTW, I hope there are some test cas

[jira] [Comment Edited] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434805#comment-15434805 ] Barry Becker edited comment on SPARK-17086 at 8/24/16 12:18 PM: ---

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434805#comment-15434805 ] Barry Becker commented on SPARK-17086: -- Is it possible to get this fix into 2.0.1? M

[jira] [Commented] (SPARK-6509) MDLP discretizer

2016-08-22 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431341#comment-15431341 ] Barry Becker commented on SPARK-6509: - I may have missed the reasoning somewhere, but

  1   2   >