Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread Padma Penumarthy
With average row size method, since I know number of rows and the average size for each column, I am planning to use that information to allocate required memory for each vector upfront. This should help avoid copying every time we double and also improve memory utilization. Thanks Padma >

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread Paul Rogers
Hi Salim. Thanks much for the detailed explanation! You clearly have developed a deep understanding of the Parquet code and its impact on CPU and I/O performance. My comments are more from the holistic perspective as Drill as a whole. Far too much to discuss on the dev list. I've added your comme

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread Paul Rogers
Parth notes: Also note that memory allocations by Netty greater than the 16MB chunk sizeare returned to the OS when the memory is free'd. Both this document andthe original document on memory fragmentation state incorrectly that suchmemory is not released back to the OS. A quick thought experim

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread salim achouche
Paul, I cannot thank you enough for your help and guidance! You are right that columnar readers will have a harder time balancing resource requirements and performance. Nevertheless, DRILL-6147 is a starting point; it should allow us to gain knowledge and accordingly refine our strategy as we g

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread Paul Rogers
One more thought: > > 3) Assuming that you go with the average batch size calculation approach, The average batch size approach is a quick and dirty approach for non-leaf operators that can observe an incoming batch to estimate row width. Because Drill batches are large, the law of large numbers

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread salim achouche
Thanks Parth for your feedback! I am planning to enhance the document based on the received feedback and the prototype I am currently working on! Regards, Salim On Sun, Feb 11, 2018 at 2:36 PM, salim achouche wrote: > Thanks Paul for your feedback! let me try to answer some of your questions >

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread salim achouche
Thanks Paul for your feedback! let me try to answer some of your questions / comments: Duplicate Implementation - I am not contemplating two different implementations; one for Parquet and another for the rest of the code - Instead, I am reacting to the fact that we have two different processing

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread Paul Rogers
Hi All, Perhaps this topic needs just a bit more thought and discussion to avoid working at cross purposes. I've outlined the issues, and a possible path forward, in a comment to DRILL-6147. Quick summary: creating a second batch size implementation just for Parquet will be very difficult once w

[GitHub] drill issue #1110: DRILL-6115: SingleMergeExchange is not scaling up when ma...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/1110 @HanumathRao I have a few comments in the JIRA for the overall design; we can discuss. ---

[GitHub] drill pull request #1110: DRILL-6115: SingleMergeExchange is not scaling up ...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1110#discussion_r167447863 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestOrderedMuxExchange.java --- @@ -0,0 +1,116 @@ +/** + * Licensed

[GitHub] drill pull request #1110: DRILL-6115: SingleMergeExchange is not scaling up ...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1110#discussion_r167447232 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/InsertLocalExchangeVisitor.java --- @@ -20,133 +20,34 @@ imp

[GitHub] drill pull request #1110: DRILL-6115: SingleMergeExchange is not scaling up ...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1110#discussion_r167448218 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/OrderedMuxExchange.java --- @@ -0,0 +1,64 @@ +/** + * Licensed to

[GitHub] drill pull request #1110: DRILL-6115: SingleMergeExchange is not scaling up ...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1110#discussion_r167448543 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestOrderedMuxExchange.java --- @@ -0,0 +1,116 @@ +/** + * Licensed

[GitHub] drill issue #1110: DRILL-6115: SingleMergeExchange is not scaling up when ma...

2018-02-11 Thread HanumathRao
Github user HanumathRao commented on the issue: https://github.com/apache/drill/pull/1110 @vrozov Thank you for reviewing the code. I have incorporated all the review comments. Please let me know if anything needs to be changed. ---

[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1117#discussion_r167441911 --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHashJoinOrdering.java --- @@ -0,0 +1,63 @@ +/** + * Licensed to the

[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1117#discussion_r167442088 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java --- @@ -246,4 +246,17 @@ public RelNode convertChild(f

[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1117#discussion_r167442059 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java --- @@ -246,4 +246,17 @@ public RelNode convertChild(f

[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1117#discussion_r167441485 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentsRunner.java --- @@ -61,7 +61,7 @@ private static final org.sl

[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1117#discussion_r167441417 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java --- @@ -197,4 +198,24 @@ public void emptyPar

Re: Batch Sizing for Parquet Flat Reader

2018-02-11 Thread Parth Chandra
Thanks Salim. Can you add this to the JIRA/design doc. Also, I would venture to suggest that the section on predicate pushdown can be made clearer. Also, Since you're proposing the average batch size approach with overflow handling, some detail on the proposed changes to the framework would be usef