Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/19184
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/19184
Thanks @mridulm , @jerryshao , @viirya . closing this PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/19184
Thanks @viirya . I have updated the patch to address your comments.
This fixes the "too many files open" issue for (e.g Q67, Q72, Q14 etc)
which involves window
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19184#discussion_r137973976
--- Diff:
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
---
@@ -104,6 +124,10 @@ public void
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/19184
I got into this with the limit of 32K. "unlimited" is another option which
can be a workaround for this. But that may not be a preferable option in
production systems. For e.g,
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/19184
[SPARK-21971][CORE] Too many open files in Spark due to concurrent fiâ¦
â¦les being opened
## What changes were proposed in this pull request?
In UnsafeExternalSorter
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
@cloud-fan . Failure is related to the parquet changes introduced for
returning metastoreSchema (it has issues with complex types). I am not very
comfortable with the Parquet codepath
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14537#discussion_r79972251
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -237,21 +237,27 @@ private[hive] class
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
@cloud-fan
>>
For branch 2.0, we should open another PR to fix the
OrcFileFormat.inferSchema, to not throw FileNotFoundException for empty table.
>>
Code for
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Sorry about the delay. Updated the PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Thanks @gatorsmile . Removed the changes related to OrcFileFormat
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Fixed the test case name. I haven't changed the parquet code path as I
wasn't sure on whether it would break any backward compatibility.
---
If your project is set up for it, you can reply
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Thanks @gatorsmile, it would be good to retain the change in
OrcFileInputFormat's inferschema (just in case it is referenced later).
---
If your project is set up for it, you can reply
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14537#discussion_r76179877
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -54,10 +57,12 @@ class OrcFileFormat extends FileFormat
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
ok, reverted the changes related to physical schema changes. In both cases,
it returns metastoreschema, and mismatches can be handled separately.
---
If your project is set up for it, you
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
For non-partitioned ORC, it is currently using the metastore schema and is
not inferring the schema currently in HiveMetastoreCatalog, and hence not an
issue. But the problem of wrong
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14537#discussion_r75967137
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -237,21 +237,26 @@ private[hive] class
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14537#discussion_r75902767
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -237,21 +237,26 @@ private[hive] class
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Thanks @gatorsmile. Addressed review comments
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
For latest ORC, if the data was written out by Hive, it would have the same
mapping.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Right, for Parquet this could be part of initial codebase (from Spark-1251
I believe) which merges any metastore conflicts with parq files. But in the
case of ORC, this inference is still
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Thanks @rxin . Incorporated review comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
@rxin Can you please review when you find time?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Thank you thejas and @mallman
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
@tejasapatil, @mallman - Can you please review when you find time?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14537
Thanks @mallman . Fixed review comments in latest commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/10846
They take longer to clean up. If queries are executed continuously, major
portion of thrift server wastes time in GC-ing.
IAC, I have removed the HadoopRDD in the recent commit
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/10846
SoftRef causes lots of mem-pressure on thrift server. To be precise, when
executing query with large dataset, it can very soon run at 1200% CPU and all
threads carrying out just GC
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/14537
[SPARK-16948][SQL] Querying empty partitioned orc tables throws excepâ¦
## What changes were proposed in this pull request?
Querying empty partitioned ORC tables from spark-sql throws
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/10846
- Rebased to master and changed title.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/10846
Sorry about the delay. Missed this one. Haven't tested this recently. But
yes, this would be a problem in master as well. Please let me know if i need to
rebase this for master
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14471
Thanks @rxin.
Changes:
1. Added test case. Also added sample orc file (392 bytes) from Hive 1.x
with format "Type: struct<_col0:int,_col1:string>". Withou
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/14471
Fixed scalastyle issues
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/12293
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/12293
@yuananf Thanks for trying it out. I have rebased it and created
https://github.com/apache/spark/pull/14471. Closing this one.
---
If your project is set up for it, you can reply
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/14471
[SPARK-14387][SQL] Exceptions thrown when querying ORC tables
## What changes were proposed in this pull request?
This PR improves ORCFileFormat to handle cases when schema stored
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/13522
Thank you. I have pushed the fixes in the recent commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/12105
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/12105
Patch went stale for master branch and got little messy in my system. I
have created https://github.com/apache/spark/pull/13522 which is rebased to
master. Will close this after view
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12105#discussion_r65885590
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
---
@@ -391,21 +393,24 @@ abstract class
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/13522
@cloud-fan - Sorry about the delay. Rebased SPARK-14321 for master.
https://github.com/apache/spark/pull/12105 had become stale and got little
messy in my system. Ended up creating this PR
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/13522
[SPARK-14321][SQL] Reduce date format cost and string-to-date cost inâ¦
## What changes were proposed in this pull request?
Here is the generated code snippet when executing date
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12105#issuecomment-222408035
Sorry about the delay in responding to this. Will try to rebase and post
the patch asap.
---
If your project is set up for it, you can reply to this email
Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/10938
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12293#issuecomment-216082665
\cc @liancheng , @rxin - Can you please review when you find time?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/11978
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/11978#issuecomment-214705705
@srowen - With the master code base & the changes that went in
(FileSourceStrategy to be specific), this PR would no longer be very relevant
in ma
Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/12514
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/12661
[SPARK-14752][SQL] LazilyGenerateOrdering throws NullPointerException
## What changes were proposed in this pull request?
LazilyGenerateOrdering throws NullPointerException when clubbed
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12319#issuecomment-214132553
Thanks @liancheng , @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12293#issuecomment-213275126
\cc @liancheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12514#issuecomment-213215983
sure @yzhou2001 . Please go ahead.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12293#issuecomment-213207842
Changes:
- Rebased patch to master branch
- Removed OrcTableScan as it is not used anywhere.
---
If your project is set up for it, you can reply
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12319#issuecomment-212766510
Thanks for the review @liancheng
Latest commit addresses the review comments. Changes are as follows
- Moved OrcRecordReader changes
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12319#issuecomment-212393980
Thanks for the review @liancheng . Should i create separate PR for
OrcRecordReader in https://github.com/pwendell/hive/tree/release-1.2.1-spark
providing
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12514#issuecomment-212392651
Sure, will check on removing the circular reference. Took the reference
tracking approach, as it is enabled by default with Spark's KryoSerializer
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12514#issuecomment-212191093
\cc @JoshRosen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/12514
[SPARK-14521][SQL] StackOverflowError in Kryo when executing TPC-DS Qâ¦
## What changes were proposed in this pull request?
Observed stackOverflowError in Kryo when executing TPC-DS
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12105#issuecomment-211286426
In the generated code, it returns null if constFormat == null. So it is
not required to change the generated code.
---
If your project is set up for it, you
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/11978#issuecomment-211175800
@srowen - As per andrew's comment, I thought it was fine to make the
change given that HadoopRDD is marked as DeveloperAPI. Please let me know if
any
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12105#discussion_r60001514
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
---
@@ -368,7 +369,10 @@ abstract class
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12105#issuecomment-210384204
Revised the patch addressing comments. Fixed eval() of UnixTime,
FromUnixTime. Haven't changed eval in DateFormatClass as i am not sure if
format can change
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12105#issuecomment-210202377
Sorry about the delay. I will share the update patch today
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12319#discussion_r59328799
--- Diff:
sql/core/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordReader.java ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12319#issuecomment-208694717
Sure @rxin. makes sense.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/12319
SPARK-14551. [SQL] Reduce number of NN calls in OrcRelation with Fileâ¦
## What changes were proposed in this pull request?
When FileSourceStrategy is used, record reader is created
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/12293
SPARK-14387. [SQL] Exceptions thrown when querying ORC tables stored â¦
## What changes were proposed in this pull request?
Physical files stored in Hive as ORC would have internal
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12105#issuecomment-205122837
Agreed. Thanks @srowen . Reverted calendar changes in DateTimeUtils in
recent commit.
---
If your project is set up for it, you can reply to this email
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/12105#issuecomment-205082821
SDF declared in the generated code is not shared in multiple threads.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/11978#issuecomment-204307805
@andrewor14 - Not sure if I understood your last comment. Currently no
direct invocation to HadoopRDD (with initLocalJobConfFuncOpt) is made in
Spark. Later
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/12105
SPARK-14321. [SQL] Reduce date format cost and string-to-date cost iâ¦
## What changes were proposed in this pull request?
Here is the generated code snippet when executing date
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/11978#issuecomment-204200260
Thanks @andrewor14 . Addressed your review comments in latest commit.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11978#discussion_r57537799
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -979,6 +979,7 @@ class SparkContext(config: SparkConf) extends Logging
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/11978#issuecomment-202048734
Tested with following suites along with the earlier sql suites
org.apache.spark.FileSuite
org.apache.spark.SparkContextSuite
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/11978
SPARK-14113. Consider marking JobConf closure-cleaning in HadoopRDD aâ¦
## What changes were proposed in this pull request?
In HadoopRDD, the following code was introduced
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/11911#issuecomment-200552367
Thanks @JoshRosen and @srowen . Retested with "lazy val" which has the same
perf improvement. Added "lazy val" in latest commit.
---
If yo
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11911#discussion_r57150734
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1745,11 +1745,16 @@ class SparkContext(config: SparkConf) extends
Logging
Github user rajeshbalamohan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11911#discussion_r57148521
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1745,10 +1745,11 @@ class SparkContext(config: SparkConf) extends
Logging
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/11911
SPARK-14091 [core] Consider improving performance of SparkContext.getâ¦
## What changes were proposed in this pull request?
Currently SparkContext.getCallSite() makes a call
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/11477#issuecomment-192146953
Thanks @srowen . Incorporated the changes.
This was tested with HiveCompatibilitySuite, HiveQuerySuite. These tests
ran fine in master branch without
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/11477
SPARK-12925. Improve HiveInspectors.unwrap for StringObjectInspector.â¦
Earlier fix did not copy the bytes and it is possible for higher level to
reuse Text object. This was causing
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10842#issuecomment-185519589
Closed #10375 which was the dup of this pull request. Review can be done on
this pull request. Thanks @JoshRosen
---
If your project is set up for it, you
Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/10375
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10375#issuecomment-185519219
Closing this as dup of #10842
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user rajeshbalamohan reopened a pull request:
https://github.com/apache/spark/pull/10842
SPARK-12417. [SQL] Orc bloom filter options are not propagated duringâ¦
Add option to have bloom filters in ORC write codepath. Added changes to
apply cleanly in master branch.
You
Github user rajeshbalamohan closed the pull request at:
https://github.com/apache/spark/pull/10842
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10846#issuecomment-184185874
@JoshRosen - Can you please let me know on proceeding with this patch?.
Patch reduces the CPU utilization of spark-thrift server in multi-user
environment
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10861#issuecomment-175584028
@JoshRosen - Please let me know if my latest comment on the usecase
addresses your question. Can you.
>>
may be worth a holistic design
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/10938
SPARK-12998 [SQL]. Enable OrcRelation when connecting via spark thrifâ¦
When a user connects via spark-thrift server to execute SQL, it does not
enable PPD with ORC. It ends up creating
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10861#issuecomment-174355176
**Usecase**: User tries to map the dataset which is partitioned (e.g TPC-DS
dataset at 200 GB scale) & runs a query in spark-shell.
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10846#issuecomment-173741417
Thanks @JoshRosen . The current patch is based on flagging approach (in
case of retaining caching) which would be safe for 1.6.x.
---
If your project is set
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/10861
SPARK-12948. [SQL]. Consider reducing size of broadcasts in OrcRelation
Size of broadcasted data in OrcRelation was significantly higher when
running query with large number of partitions
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/10846
SPARK-12920. [SQL]. Spark thrift server can run at very high CPU withâ¦
Spark thrift server runs at very high CPU when concurrent users submit
queries to the system over a period of time
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/10848
SPARK-12925. [SQL]. Improve HiveInspectors.unwrap for StringObjectInsâ¦
Text is in UTF-8 and converting it via "UTF8String.fromString" incurs
decoding and encoding, which
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10825#issuecomment-173037438
getCallSite gets the thread stack trace (+ additional processing). This is
executed numerous number of times when running a query on TPC-DS (with 1800
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/10842
SPARK-12417. [SQL] Orc bloom filter options are not propagated duringâ¦
Add option to have bloom filters in ORC write codepath. Added changes to
apply cleanly in master branch.
You can
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10825#issuecomment-173106715
Thanks for review. I have added a comment in the code for the same.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/10825
SPARK-12898. Consider having dummyCallSite for HiveTableScan
Currently, HiveTableScan runs with getCallSite which is really expensive
and shows up when scanning through large table
Github user rajeshbalamohan commented on the pull request:
https://github.com/apache/spark/pull/10375#issuecomment-166486869
Thanks @zhzhan. Enabled orc PPD by default and also added a test case for
bloom filters in the latest commit. ORC RecordReaderImpl is not public in the
version
GitHub user rajeshbalamohan opened a pull request:
https://github.com/apache/spark/pull/10375
SPARK-12417. [SQL] Orc bloom filter options are not propagated during file
â¦
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
100 matches
Mail list logo