Re: Default stripe size in ORC

2023-03-28 Thread Pavan Lanka
rwise we > may get small stripes, especially when many columns enable dictionary > encoding. > > Best, > Gang > > On Fri, Mar 17, 2023 at 1:41 AM Pavan Lanka > wrote: > >> Thanks Dongjoon, please see my responses below. >> >>> Is the bench

Re: Default stripe size in ORC

2023-03-16 Thread Pavan Lanka
l of the benchmark and how to reproduce it in > the community? > > BTW, it's one of the user configurations. We can change it at Apache ORC > 2.0 or add simple documentation. > > Bests. > Dongjoon. > > > On Thu, Mar 16, 2023 at 10:22 AM Pavan Lanka > wrote: > >>

Default stripe size in ORC

2023-03-16 Thread Pavan Lanka
Hi, I wanted to call out one observation we have seen when performing some benchmarks on ORC. I remember there was a time when the default stripe size was 256MB now we have the default at 64MB. We see big penalty of staying with the default stripe size of 64MB especially when you compare with

Re: Apache ORC 1.8.3 Release?

2023-03-10 Thread Pavan Lanka
Thanks Dongjoon. +1 to the PRs. Regards, Pavan > On Mar 9, 2023, at 5:33 PM, William H. wrote: > > Thank you Dongjoon for volunteering as the release manager! > > +1 for the PRs and feel free to delay the release a couple of days. > > Thank you, > William > > On Wed, Mar 8, 2023 at 10:30

Re: Row level filter question, ORC-743, ORC-577

2023-03-03 Thread Pavan Lanka
Hi Zoltan, There are a few configuration properties that need to be enabled for activating LazyIO, the defaults are off https://orc.apache.org/develop/design/lazy_filter/#Configurationhttps://github.com/apache/orc/blob/main/java/core/src/java/org/apache/orc/OrcConf.java#L179 > On Feb 24,

Re: [VOTE] Release Apache ORC 1.7.8 (RC0)

2023-01-19 Thread Pavan Lanka
en for implementng that missing features in >> new releases. Feel free to make a PR if you are interested. >> >> Dongjoon >> >> On Wed, Jan 18, 2023 at 8:30 PM Pavan Lanka >> wrote: >> >>> I wanted to follow up on ORC-1343 (Igno

Re: [VOTE] Release Apache ORC 1.7.8 (RC0)

2023-01-18 Thread Pavan Lanka
I wanted to follow up on ORC-1343 (Ignore orc.create.index). * The discussion in the Jira seems to indicate that we want to revert ORC-1283, but the included changes are not a full revert, was that intentional? * I think any files created with the previous version of the writer with index as

Re: Apache ORC Docker Hub?

2022-07-29 Thread Pavan Lanka
+1, very good idea. Regards, Pavan > On Jul 29, 2022, at 6:43 AM, Dongjoon Hyun wrote: > > +1, it will improve the efficiency of Apache ORC development. > > We can also use them to improce our Github Action running time. > > Thank you, William. > > Dongjoon > > On Thu, Jul 28, 2022 at 9:46

[jira] [Created] (ORC-1202) Documentation for LazyIO not accessible from the site

2022-06-13 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-1202: Summary: Documentation for LazyIO not accessible from the site Key: ORC-1202 URL: https://issues.apache.org/jira/browse/ORC-1202 Project: ORC Issue Type: Bug

[jira] [Created] (ORC-1139) Benchmark for Seek vs Read

2022-03-28 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-1139: Summary: Benchmark for Seek vs Read Key: ORC-1139 URL: https://issues.apache.org/jira/browse/ORC-1139 Project: ORC Issue Type: Sub-task Components: Java

[jira] [Created] (ORC-1140) Documentation for Seek vs Read

2022-03-28 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-1140: Summary: Documentation for Seek vs Read Key: ORC-1140 URL: https://issues.apache.org/jira/browse/ORC-1140 Project: ORC Issue Type: Sub-task Components

[jira] [Created] (ORC-1138) Seek vs Read Optimization

2022-03-28 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-1138: Summary: Seek vs Read Optimization Key: ORC-1138 URL: https://issues.apache.org/jira/browse/ORC-1138 Project: ORC Issue Type: Sub-task Components: Java

[jira] [Created] (ORC-1136) Optimize reads by combining multiple reads without significant separation into a single read

2022-03-25 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-1136: Summary: Optimize reads by combining multiple reads without significant separation into a single read Key: ORC-1136 URL: https://issues.apache.org/jira/browse/ORC-1136

Re: GitHub issues tab

2022-01-21 Thread Pavan Lanka
+1 to this. In terms of integration are we saying that once a Jira is created, we will include the URL of the Jira in the GitHub issue and from that point forward all updates/tracking happens in the Jira? Regards, Pavan > On Jan 15, 2022, at 4:18 PM, Dongjoon Hyun wrote: > > Thank you for

Re: [VOTE] Release Apache ORC 1.7.1 (RC0)

2021-11-08 Thread Pavan Lanka
Sorry I am late to this vote. +1 * Built and tested Java using OpenJDK 8 * Built and tested C++ on OSX * Verified ORC-1027: Allows the discovery of filters via the plugin interface on this release, needed only a minor adjustment of `com.google.auto.service:auto-service ` Regards, Pavan > On

Re: [DISCUSS] Apache ORC Release Cadence

2021-10-21 Thread Pavan Lanka
Thanks Dongjoon for initiating this. I wanted to include the release policy into this discussion as that will influence what kind of cadence we might need. I would like to understand the differentiation between a Patch Release X.Y.Z and a Minor Release X.Y. Are there any scenarios where a

[jira] [Created] (ORC-1027) Filter processing to allow filter injections that cannot be represented via SArgs

2021-10-12 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-1027: Summary: Filter processing to allow filter injections that cannot be represented via SArgs Key: ORC-1027 URL: https://issues.apache.org/jira/browse/ORC-1027 Project: ORC

EOFException when performing ORC Reads on AWS S3 using s3a://

2021-09-30 Thread Pavan Lanka
Wanted to share this information in case anyone else runs into a similar problem. Problem —— I was getting the following exception when an ORC Read was taking place ```text Caused by: java.io.IOException: Problem opening stripe 0 footer in s3a://. at

Re: [VOTE] Release Apache ORC 1.7.0 (RC0)

2021-09-15 Thread Pavan Lanka
+1 (non-binding) Performed the following: * C++ build and test * Java build and test * Manual testing of the filters * Built and tested Spark 3.x with 1.7.0 - there are a couple of minor test failures, one a result of SearchArgument String representation and another a result of the minor

Re: [VOTE] Release Apache ORC 1.6.11 (RC0)

2021-09-14 Thread Pavan Lanka
+1 (non-binding) Performed the following: * C++ build and test * Java build and test using OpenJDK 8 Regards, Pavan > On Sep 12, 2021, at 8:04 PM, Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache ORC version > 1.6.11. > > [ ] +1 Release this package as

[jira] [Created] (ORC-983) Lower the log level of some messages related to filter processing

2021-09-03 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-983: --- Summary: Lower the log level of some messages related to filter processing Key: ORC-983 URL: https://issues.apache.org/jira/browse/ORC-983 Project: ORC Issue Type

[jira] [Created] (ORC-980) Filter processing ignores the schema case-sensitivity flag

2021-09-02 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-980: --- Summary: Filter processing ignores the schema case-sensitivity flag Key: ORC-980 URL: https://issues.apache.org/jira/browse/ORC-980 Project: ORC Issue Type: Bug

Re: Apache ORC 1.7.0 QA status

2021-07-26 Thread Pavan Lanka
1. After branching branch-1.7, I shared Apache Iceberg integration test> >>> failure.> >>> 2. Kyle Bendickson reported the same issue. In addition, when he >> reverted> >>> to rbtree, he met NPE in some cases. He is investigating it.> >>> 3. Da

[jira] [Created] (ORC-811) Benchmarks for Filters

2021-06-04 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-811: --- Summary: Benchmarks for Filters Key: ORC-811 URL: https://issues.apache.org/jira/browse/ORC-811 Project: ORC Issue Type: Sub-task Components: Java

Re: [VOTE] Should we release ORC 1.6.8rc0?

2021-05-20 Thread Pavan Lanka
+1 (non-binding) I verified C++ and Java builds and tests Regards, Pavan > On May 19, 2021, at 11:22 PM, Dongjoon Hyun wrote: > > Thank you so much, Gang, Panos, William! > > Bests, > Dongjoon > > > > On Wed, May 19, 2021 at 2:39 PM William Hyun wrote: > >> +1 >> >> I tested with Java

Re: Null filesystem when using ORC writer

2021-04-16 Thread Pavan Lanka
Hi Ryan, In case you have not checked this might be a good starting point for you. https://orc.apache.org/docs/core-java.html#simple-example When I follow the code you shared, I don’t quite follow why you are creating and passing the

Re: [DRAFT] ORC Board Report April 2021

2021-04-13 Thread Pavan Lanka
Looks good Owen. One minor comment inline below. > On Apr 12, 2021, at 4:49 PM, Owen O'Malley wrote: > > All, > Please send me any feedback. Thanks! > > ## Description: > The mission of ORC is the creation and maintenance of software related to > the > smallest, fastest columnar storage for

[jira] [Created] (ORC-759) StructBatchReader should always skip processing on the rootReader

2021-03-12 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-759: --- Summary: StructBatchReader should always skip processing on the rootReader Key: ORC-759 URL: https://issues.apache.org/jira/browse/ORC-759 Project: ORC Issue Type

[jira] [Created] (ORC-758) Avoid decompressing compressed streams if already decompressed

2021-03-11 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-758: --- Summary: Avoid decompressing compressed streams if already decompressed Key: ORC-758 URL: https://issues.apache.org/jira/browse/ORC-758 Project: ORC Issue Type: Sub

[jira] [Created] (ORC-755) Introduce OrcFilterContext

2021-03-01 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-755: --- Summary: Introduce OrcFilterContext Key: ORC-755 URL: https://issues.apache.org/jira/browse/ORC-755 Project: ORC Issue Type: Sub-task Components: Java

[jira] [Created] (ORC-754) Code cleanup changes

2021-02-23 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-754: --- Summary: Code cleanup changes Key: ORC-754 URL: https://issues.apache.org/jira/browse/ORC-754 Project: ORC Issue Type: Sub-task Components: Java Affects

Clarifications needed on some development standards

2021-02-19 Thread Pavan Lanka
Dear ORC Community, I have been working on the ORC-742 as part of the review there were some comments on empty lines and indentation. I have been working on them individually. I thought I will reach out the community for guidelines on this that I can

[jira] [Created] (ORC-744) LazyIO of non-filter columns

2021-01-25 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-744: --- Summary: LazyIO of non-filter columns Key: ORC-744 URL: https://issues.apache.org/jira/browse/ORC-744 Project: ORC Issue Type: Improvement Components

[jira] [Created] (ORC-743) Conversion of SArg into Filters, to take advantage of LazyIO

2021-01-25 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-743: --- Summary: Conversion of SArg into Filters, to take advantage of LazyIO Key: ORC-743 URL: https://issues.apache.org/jira/browse/ORC-743 Project: ORC Issue Type: New

[jira] [Created] (ORC-742) LazyIO of non-filter columns in the presence of filters

2021-01-25 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-742: --- Summary: LazyIO of non-filter columns in the presence of filters Key: ORC-742 URL: https://issues.apache.org/jira/browse/ORC-742 Project: ORC Issue Type: Improvement

[jira] [Created] (ORC-741) Schema Evolution missing column is not handled in the presence of filters

2021-01-25 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-741: --- Summary: Schema Evolution missing column is not handled in the presence of filters Key: ORC-741 URL: https://issues.apache.org/jira/browse/ORC-741 Project: ORC Issue

[jira] [Created] (ORC-622) Refactoring of TreeReader into TypeReader and BatchReader

2020-04-22 Thread Pavan Lanka (Jira)
Pavan Lanka created ORC-622: --- Summary: Refactoring of TreeReader into TypeReader and BatchReader Key: ORC-622 URL: https://issues.apache.org/jira/browse/ORC-622 Project: ORC Issue Type