With average row size method, since I know number of rows and the average size
for each column,
I am planning to use that information to allocate required memory for each
vector upfront.
This should help avoid copying every time we double and also improve memory
utilization.
Thanks
Padma
>
Hi Salim.
Thanks much for the detailed explanation! You clearly have developed a deep
understanding of the Parquet code and its impact on CPU and I/O performance. My
comments are more from the holistic perspective as Drill as a whole.
Far too much to discuss on the dev list. I've added your comme
Parth notes:
Also note that memory allocations by Netty greater than the 16MB chunk sizeare
returned to the OS when the memory is free'd. Both this document andthe
original document on memory fragmentation state incorrectly that suchmemory is
not released back to the OS. A quick thought experim
Paul,
I cannot thank you enough for your help and guidance! You are right that
columnar readers will have a harder time balancing resource requirements and
performance. Nevertheless, DRILL-6147 is a starting point; it should allow us
to gain knowledge and accordingly refine our strategy as we g
One more thought:
> > 3) Assuming that you go with the average batch size calculation approach,
The average batch size approach is a quick and dirty approach for non-leaf
operators that can observe an incoming batch to estimate row width. Because
Drill batches are large, the law of large numbers
Thanks Parth for your feedback! I am planning to enhance the document based
on the received feedback and the prototype I am currently working on!
Regards,
Salim
On Sun, Feb 11, 2018 at 2:36 PM, salim achouche
wrote:
> Thanks Paul for your feedback! let me try to answer some of your questions
>
Thanks Paul for your feedback! let me try to answer some of your questions /
comments:
Duplicate Implementation
- I am not contemplating two different implementations; one for Parquet and
another for the rest of the code
- Instead, I am reacting to the fact that we have two different processing
Hi All,
Perhaps this topic needs just a bit more thought and discussion to avoid
working at cross purposes. I've outlined the issues, and a possible path
forward, in a comment to DRILL-6147.
Quick summary: creating a second batch size implementation just for Parquet
will be very difficult once w
Github user amansinha100 commented on the issue:
https://github.com/apache/drill/pull/1110
@HanumathRao I have a few comments in the JIRA for the overall design; we
can discuss.
---
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1110#discussion_r167447863
--- Diff:
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestOrderedMuxExchange.java
---
@@ -0,0 +1,116 @@
+/**
+ * Licensed
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1110#discussion_r167447232
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/InsertLocalExchangeVisitor.java
---
@@ -20,133 +20,34 @@
imp
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1110#discussion_r167448218
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/OrderedMuxExchange.java
---
@@ -0,0 +1,64 @@
+/**
+ * Licensed to
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1110#discussion_r167448543
--- Diff:
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestOrderedMuxExchange.java
---
@@ -0,0 +1,116 @@
+/**
+ * Licensed
Github user HanumathRao commented on the issue:
https://github.com/apache/drill/pull/1110
@vrozov Thank you for reviewing the code. I have incorporated all the
review comments. Please let me know if anything needs to be changed.
---
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1117#discussion_r167441911
--- Diff:
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHashJoinOrdering.java
---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1117#discussion_r167442088
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java
---
@@ -246,4 +246,17 @@ public RelNode convertChild(f
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1117#discussion_r167442059
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java
---
@@ -246,4 +246,17 @@ public RelNode convertChild(f
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1117#discussion_r167441485
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentsRunner.java
---
@@ -61,7 +61,7 @@
private static final org.sl
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/1117#discussion_r167441417
--- Diff:
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
---
@@ -197,4 +198,24 @@ public void emptyPar
Thanks Salim.
Can you add this to the JIRA/design doc. Also, I would venture to suggest
that the section on predicate pushdown can be made clearer.
Also, Since you're proposing the average batch size approach with overflow
handling, some detail on the proposed changes to the framework would be
usef
20 matches
Mail list logo