[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-27 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/19184 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-27 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/19184 Thanks @mridulm , @jerryshao , @viirya . closing this PR. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-11 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/19184 Thanks @viirya . I have updated the patch to address your comments. This fixes the "too many files open" issue for (e.g Q67, Q72, Q14 etc) which involves window

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/19184#discussion_r137973976 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java --- @@ -104,6 +124,10 @@ public void

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/19184 I got into this with the limit of 32K. "unlimited" is another option which can be a workaround for this. But that may not be a preferable option in production systems. For e.g,

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/19184 [SPARK-21971][CORE] Too many open files in Spark due to concurrent fi… …les being opened ## What changes were proposed in this pull request? In UnsafeExternalSorter

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-22 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 @cloud-fan . Failure is related to the parquet changes introduced for returning metastoreSchema (it has issues with complex types). I am not very comfortable with the Parquet codepath

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Use metastore schema instead o...

2016-09-21 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r79972251 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,27 @@ private[hive] class

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-21 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 @cloud-fan >> For branch 2.0, we should open another PR to fix the OrcFileFormat.inferSchema, to not throw FileNotFoundException for empty table. >> Code for

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-02 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Sorry about the delay. Updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-25 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Thanks @gatorsmile . Removed the changes related to OrcFileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-25 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Fixed the test case name. I haven't changed the parquet code path as I wasn't sure on whether it would break any backward compatibility. --- If your project is set up for it, you can reply

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-24 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Thanks @gatorsmile, it would be good to retain the change in OrcFileInputFormat's inferschema (just in case it is referenced later). --- If your project is set up for it, you can reply

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-24 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r76179877 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -54,10 +57,12 @@ class OrcFileFormat extends FileFormat

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-24 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 ok, reverted the changes related to physical schema changes. In both cases, it returns metastoreschema, and mismatches can be handled separately. --- If your project is set up for it, you

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-24 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 For non-partitioned ORC, it is currently using the metastore schema and is not inferring the schema currently in HiveMetastoreCatalog, and hence not an issue. But the problem of wrong

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75967137 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,26 @@ private[hive] class

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75902767 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,26 @@ private[hive] class

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-23 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Thanks @gatorsmile. Addressed review comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-22 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 For latest ORC, if the data was written out by Hive, it would have the same mapping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-22 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Right, for Parquet this could be part of initial codebase (from Spark-1251 I believe) which merges any metastore conflicts with parq files. But in the case of ORC, this inference is still

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-22 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Thanks @rxin . Incorporated review comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-12 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 @rxin Can you please review when you find time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-12 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Thank you thejas and @mallman --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-12 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 @tejasapatil, @mallman - Can you please review when you find time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-09 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Thanks @mallman . Fixed review comments in latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/10846 They take longer to clean up. If queries are executed continuously, major portion of thrift server wastes time in GC-ing. IAC, I have removed the HadoopRDD in the recent commit

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/10846 SoftRef causes lots of mem-pressure on thrift server. To be precise, when executing query with large dataset, it can very soon run at 1200% CPU and all threads carrying out just GC

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-08 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/14537 [SPARK-16948][SQL] Querying empty partitioned orc tables throws excep… ## What changes were proposed in this pull request? Querying empty partitioned ORC tables from spark-sql throws

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-07 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/10846 - Rebased to master and changed title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #10846: SPARK-12920. [SQL]. Spark thrift server can run at very ...

2016-08-05 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/10846 Sorry about the delay. Missed this one. Haven't tested this recently. But yes, this would be a problem in master as well. Please let me know if i need to rebase this for master

[GitHub] spark issue #14471: [SPARK-14387][SQL] Enable Hive-1.x ORC compatibility wit...

2016-08-03 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14471 Thanks @rxin. Changes: 1. Added test case. Also added sample orc file (392 bytes) from Hive 1.x with format "Type: struct<_col0:int,_col1:string>". Withou

[GitHub] spark issue #14471: [SPARK-14387][SQL] Exceptions thrown when querying ORC t...

2016-08-02 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14471 Fixed scalastyle issues --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #12293: [SPARK-14387][SQL] Exceptions thrown when queryin...

2016-08-02 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/12293 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #12293: [SPARK-14387][SQL] Exceptions thrown when querying ORC t...

2016-08-02 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/12293 @yuananf Thanks for trying it out. I have rebased it and created https://github.com/apache/spark/pull/14471. Closing this one. --- If your project is set up for it, you can reply

[GitHub] spark pull request #14471: [SPARK-14387][SQL] Exceptions thrown when queryin...

2016-08-02 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/14471 [SPARK-14387][SQL] Exceptions thrown when querying ORC tables ## What changes were proposed in this pull request? This PR improves ORCFileFormat to handle cases when schema stored

[GitHub] spark issue #13522: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-07 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/13522 Thank you. I have pushed the fixes in the recent commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #12105: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-07 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/12105 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #12105: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-06 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/12105 Patch went stale for master branch and got little messy in my system. I have created https://github.com/apache/spark/pull/13522 which is rebased to master. Will close this after view

[GitHub] spark pull request #12105: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-06 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/12105#discussion_r65885590 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -391,21 +393,24 @@ abstract class

[GitHub] spark issue #13522: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-06 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/13522 @cloud-fan - Sorry about the delay. Rebased SPARK-14321 for master. https://github.com/apache/spark/pull/12105 had become stale and got little messy in my system. Ended up creating this PR

[GitHub] spark pull request #13522: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-06 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/13522 [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in… ## What changes were proposed in this pull request? Here is the generated code snippet when executing date

[GitHub] spark pull request: [SPARK-14321][SQL] Reduce date format cost and...

2016-05-29 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-222408035 Sorry about the delay in responding to this. Will try to rebase and post the patch asap. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-12998 [SQL]. Enable OrcRelation when con...

2016-05-03 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/10938 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-14387][SQL] Exceptions thrown when quer...

2016-05-01 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12293#issuecomment-216082665 \cc @liancheng , @rxin - Can you please review when you find time? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-05-01 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/11978 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-04-26 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-214705705 @srowen - With the master code base & the changes that went in (FileSourceStrategy to be specific), this PR would no longer be very relevant in ma

[GitHub] spark pull request: [SPARK-14521][SQL] StackOverflowError in Kryo ...

2016-04-26 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/12514 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-14752][SQL] LazilyGenerateOrdering thro...

2016-04-25 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/12661 [SPARK-14752][SQL] LazilyGenerateOrdering throws NullPointerException ## What changes were proposed in this pull request? LazilyGenerateOrdering throws NullPointerException when clubbed

[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-24 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-214132553 Thanks @liancheng , @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14387][SQL] Exceptions thrown when quer...

2016-04-22 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12293#issuecomment-213275126 \cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14521][SQL] StackOverflowError in Kryo ...

2016-04-21 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12514#issuecomment-213215983 sure @yzhou2001 . Please go ahead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14387][SQL] Exceptions thrown when quer...

2016-04-21 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12293#issuecomment-213207842 Changes: - Rebased patch to master branch - Removed OrcTableScan as it is not used anywhere. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-21 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-212766510 Thanks for the review @liancheng Latest commit addresses the review comments. Changes are as follows - Moved OrcRecordReader changes

[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-20 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-212393980 Thanks for the review @liancheng . Should i create separate PR for OrcRecordReader in https://github.com/pwendell/hive/tree/release-1.2.1-spark providing

[GitHub] spark pull request: [SPARK-14521][SQL] StackOverflowError in Kryo ...

2016-04-20 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12514#issuecomment-212392651 Sure, will check on removing the circular reference. Took the reference tracking approach, as it is enabled by default with Spark's KryoSerializer

[GitHub] spark pull request: [SPARK-14521][SQL] StackOverflowError in Kryo ...

2016-04-19 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12514#issuecomment-212191093 \cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14521][SQL] StackOverflowError in Kryo ...

2016-04-19 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/12514 [SPARK-14521][SQL] StackOverflowError in Kryo when executing TPC-DS Q… ## What changes were proposed in this pull request? Observed stackOverflowError in Kryo when executing TPC-DS

[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-18 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-211286426 In the generated code, it returns null if constFormat == null. So it is not required to change the generated code. --- If your project is set up for it, you

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-04-17 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-211175800 @srowen - As per andrew's comment, I thought it was fine to make the change given that HadoopRDD is marked as DeveloperAPI. Please let me know if any

[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-17 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/12105#discussion_r60001514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -368,7 +369,10 @@ abstract class

[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-15 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-210384204 Revised the patch addressing comments. Fixed eval() of UnixTime, FromUnixTime. Haven't changed eval in DateFormatClass as i am not sure if format can change

[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-14 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-210202377 Sorry about the delay. I will share the update patch today --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-12 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/12319#discussion_r59328799 --- Diff: sql/core/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordReader.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-11 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-208694717 Sure @rxin. makes sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-14551. [SQL] Reduce number of NN calls i...

2016-04-11 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/12319 SPARK-14551. [SQL] Reduce number of NN calls in OrcRelation with File… ## What changes were proposed in this pull request? When FileSourceStrategy is used, record reader is created

[GitHub] spark pull request: SPARK-14387. [SQL] Exceptions thrown when quer...

2016-04-11 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/12293 SPARK-14387. [SQL] Exceptions thrown when querying ORC tables stored … ## What changes were proposed in this pull request? Physical files stored in Hive as ORC would have internal

[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-03 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-205122837 Agreed. Thanks @srowen . Reverted calendar changes in DateTimeUtils in recent commit. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-03 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-205082821 SDF declared in the generated code is not shared in multiple threads. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-04-01 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-204307805 @andrewor14 - Not sure if I understood your last comment. Currently no direct invocation to HadoopRDD (with initLocalJobConfFuncOpt) is made in Spark. Later

[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-03-31 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/12105 SPARK-14321. [SQL] Reduce date format cost and string-to-date cost i… ## What changes were proposed in this pull request? Here is the generated code snippet when executing date

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-03-31 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-204200260 Thanks @andrewor14 . Addressed your review comments in latest commit. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-03-27 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/11978#discussion_r57537799 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -979,6 +979,7 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-03-27 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-202048734 Tested with following suites along with the earlier sql suites org.apache.spark.FileSuite org.apache.spark.SparkContextSuite

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

2016-03-26 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/11978 SPARK-14113. Consider marking JobConf closure-cleaning in HadoopRDD a… ## What changes were proposed in this pull request? In HadoopRDD, the following code was introduced

[GitHub] spark pull request: SPARK-14091 [core] Consider improving performa...

2016-03-23 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11911#issuecomment-200552367 Thanks @JoshRosen and @srowen . Retested with "lazy val" which has the same perf improvement. Added "lazy val" in latest commit. --- If yo

[GitHub] spark pull request: SPARK-14091 [core] Consider improving performa...

2016-03-23 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/11911#discussion_r57150734 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1745,11 +1745,16 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: SPARK-14091 [core] Consider improving performa...

2016-03-23 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/11911#discussion_r57148521 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1745,10 +1745,11 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: SPARK-14091 [core] Consider improving performa...

2016-03-22 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/11911 SPARK-14091 [core] Consider improving performance of SparkContext.get… ## What changes were proposed in this pull request? Currently SparkContext.getCallSite() makes a call

[GitHub] spark pull request: SPARK-12925. Improve HiveInspectors.unwrap for...

2016-03-03 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11477#issuecomment-192146953 Thanks @srowen . Incorporated the changes. This was tested with HiveCompatibilitySuite, HiveQuerySuite. These tests ran fine in master branch without

[GitHub] spark pull request: SPARK-12925. Improve HiveInspectors.unwrap for...

2016-03-02 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/11477 SPARK-12925. Improve HiveInspectors.unwrap for StringObjectInspector.… Earlier fix did not copy the bytes and it is possible for higher level to reuse Text object. This was causing

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2016-02-17 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10842#issuecomment-185519589 Closed #10375 which was the dup of this pull request. Review can be done on this pull request. Thanks @JoshRosen --- If your project is set up for it, you

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2016-02-17 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/10375 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2016-02-17 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10375#issuecomment-185519219 Closing this as dup of #10842 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2016-02-17 Thread rajeshbalamohan
GitHub user rajeshbalamohan reopened a pull request: https://github.com/apache/spark/pull/10842 SPARK-12417. [SQL] Orc bloom filter options are not propagated during… Add option to have bloom filters in ORC write codepath. Added changes to apply cleanly in master branch. You

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2016-02-17 Thread rajeshbalamohan
Github user rajeshbalamohan closed the pull request at: https://github.com/apache/spark/pull/10842 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: SPARK-12920. [SQL]. Spark thrift server can ru...

2016-02-15 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10846#issuecomment-184185874 @JoshRosen - Can you please let me know on proceeding with this patch?. Patch reduces the CPU utilization of spark-thrift server in multi-user environment

[GitHub] spark pull request: SPARK-12948. [SQL]. Consider reducing size of ...

2016-01-27 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10861#issuecomment-175584028 @JoshRosen - Please let me know if my latest comment on the usecase addresses your question. Can you. >> may be worth a holistic design

[GitHub] spark pull request: SPARK-12998 [SQL]. Enable OrcRelation when con...

2016-01-26 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/10938 SPARK-12998 [SQL]. Enable OrcRelation when connecting via spark thrif… When a user connects via spark-thrift server to execute SQL, it does not enable PPD with ORC. It ends up creating

[GitHub] spark pull request: SPARK-12948. [SQL]. Consider reducing size of ...

2016-01-24 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10861#issuecomment-174355176 **Usecase**: User tries to map the dataset which is partitioned (e.g TPC-DS dataset at 200 GB scale) & runs a query in spark-shell.

[GitHub] spark pull request: SPARK-12920. [SQL]. Spark thrift server can ru...

2016-01-21 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10846#issuecomment-173741417 Thanks @JoshRosen . The current patch is based on flagging approach (in case of retaining caching) which would be safe for 1.6.x. --- If your project is set

[GitHub] spark pull request: SPARK-12948. [SQL]. Consider reducing size of ...

2016-01-20 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/10861 SPARK-12948. [SQL]. Consider reducing size of broadcasts in OrcRelation Size of broadcasted data in OrcRelation was significantly higher when running query with large number of partitions

[GitHub] spark pull request: SPARK-12920. [SQL]. Spark thrift server can ru...

2016-01-20 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/10846 SPARK-12920. [SQL]. Spark thrift server can run at very high CPU with… Spark thrift server runs at very high CPU when concurrent users submit queries to the system over a period of time

[GitHub] spark pull request: SPARK-12925. [SQL]. Improve HiveInspectors.unw...

2016-01-20 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/10848 SPARK-12925. [SQL]. Improve HiveInspectors.unwrap for StringObjectIns… Text is in UTF-8 and converting it via "UTF8String.fromString" incurs decoding and encoding, which

[GitHub] spark pull request: SPARK-12898. Consider having dummyCallSite for...

2016-01-19 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10825#issuecomment-173037438 getCallSite gets the thread stack trace (+ additional processing). This is executed numerous number of times when running a query on TPC-DS (with 1800

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2016-01-19 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/10842 SPARK-12417. [SQL] Orc bloom filter options are not propagated during… Add option to have bloom filters in ORC write codepath. Added changes to apply cleanly in master branch. You can

[GitHub] spark pull request: SPARK-12898. Consider having dummyCallSite for...

2016-01-19 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10825#issuecomment-173106715 Thanks for review. I have added a comment in the code for the same. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-12898. Consider having dummyCallSite for...

2016-01-18 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/10825 SPARK-12898. Consider having dummyCallSite for HiveTableScan Currently, HiveTableScan runs with getCallSite which is really expensive and shows up when scanning through large table

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2015-12-21 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/10375#issuecomment-166486869 Thanks @zhzhan. Enabled orc PPD by default and also added a test case for bloom filters in the latest commit. ORC RecordReaderImpl is not public in the version

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2015-12-18 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/10375 SPARK-12417. [SQL] Orc bloom filter options are not propagated during file … You can merge this pull request into a Git repository by running: $ git pull https://github.com

  1   2   >