Re: How to generate hash code for each build side one of the hash join columns

2018-06-01 Thread weijie tong
___ > From: weijie tong > Sent: Friday, June 1, 2018 7:27 AM > To: dev@drill.apache.org > Subject: Re: How to generate hash code for each build side one of the hash > join columns > > Could someone explain the theory of SelectionVector2 ? I was confused about

Re: How to generate hash code for each build side one of the hash join columns

2018-06-01 Thread Sorabh Hamirwasia
tong Sent: Friday, June 1, 2018 7:27 AM To: dev@drill.apache.org Subject: Re: How to generate hash code for each build side one of the hash join columns Could someone explain the theory of SelectionVector2 ? I was confused about the code implementation. I think it acts as an indirect index to th

Re: How to generate hash code for each build side one of the hash join columns

2018-06-01 Thread weijie tong
Could someone explain the theory of SelectionVector2 ? I was confused about the code implementation. I think it acts as an indirect index to the filtered RecordBatch. To FilterRecordBatch, it filters the RecordBatch and the satisfied row index will be storied in the SelectionVector2. To ProjectReco

Re: How to generate hash code for each build side one of the hash join columns

2018-06-01 Thread weijie tong
I find the answer that RecordBatch's max size is 2^16 which is defined at RecordBatch's MAX_BATCH_SIZE. On Fri, Jun 1, 2018 at 3:36 PM weijie tong wrote: > Some questions about SelectionVector2 and SelectionVector4: > > I want to create SelectionVector4 or SelectionVector2 to represent the > fil

Re: How to generate hash code for each build side one of the hash join columns

2018-06-01 Thread weijie tong
Some questions about SelectionVector2 and SelectionVector4: I want to create SelectionVector4 or SelectionVector2 to represent the filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does not support SelectVector4 . And the SelectionVector2's record count size is char type size .

Re: How to generate hash code for each build side one of the hash join columns

2018-05-31 Thread weijie tong
Hi Boaz: Your propose is valuable though I have implemented the dynamic generating code logic. If a ``` long hash64(int index, long seed) ``` method is added to the ValueVector , it will also benefit others to implement specific storage plugin's filter logic by using the pushed down bloom filt

Re: How to generate hash code for each build side one of the hash join columns

2018-05-31 Thread Boaz Ben-Zvi
Hi Weijie, Another option is to totally avoid the generated code. We were considering the idea of replacing the generated code used for computing hash values with “real java” code. This idea is analogous to the usage of the copyEntry() method in the ValueVector interface (that Paul added l

Re: How to generate hash code for each build side one of the hash join columns

2018-05-30 Thread weijie tong
Hi Aman: Thanks for your tips. I have rebased the latest code from the master branch . Yes, the spill-to-disk feature does changed the original implementation. I have adjusted my implementation according to the new feature. But as you say, it will take some challenge to integration as I noticed

Re: How to generate hash code for each build side one of the hash join columns

2018-05-30 Thread Aman Sinha
Hi Weijie, I was hoping you could leverage the existing methods..so its good that you found the ones that work for your use case. One thing I want to point out (maybe you're already aware) .. the Hash Join code has changed significantly in the master branch due to the spill-to-disk feature. So, thi

Re: How to generate hash code for each build side one of the hash join columns

2018-05-29 Thread weijie tong
I found ClassGenerator's nestEvalBlock(JBlock block) and unNestEvalBlock() which has the same effect to what I change to the ClassGenerator. So I give up what I change to the ClassGenerator and hope this can help someone else. On Tue, May 29, 2018 at 1:53 PM weijie tong wrote: > The code formatt

Re: How to generate hash code for each build side one of the hash join columns

2018-05-28 Thread weijie tong
The code formatting is not nice. Put them again: private void setupGetBuild64Hash(ClassGenerator cg, MappingSet incomingMapping, VectorAccessible batch, LogicalExpression[] keyExprs, TypedFieldId[] buildKeyFieldIds) throws SchemaChangeException { cg.setMappingSet(incomingMapping); if (keyExprs ==

Re: How to generate hash code for each build side one of the hash join columns

2018-05-28 Thread weijie tong
HI Paul: Thanks for your enthusiasm. I have managed this skill as you ever mentioned me at another mail thread. It's really helpful ,thanks for your valuable work. Now I have solved this tough problem by adding a customized JBlock member field to the ClassGenerator. So once you want the getEva

Re: How to generate hash code for each build side one of the hash join columns

2018-05-28 Thread Paul Rogers
Hi Weijie, Seeing the discussion about the details of JCodeModel suggests you may be trying to debug your generated code at the level of the code generator. Some time ago we added the ability to step through the generated code. Look for the following line in the generator code:     // Uncomme

Re: How to generate hash code for each build side one of the hash join columns

2018-05-28 Thread weijie tong
@aman thanks for your reply. "For the ifBlock, do you need an _else() block also ?" I give a default return logic at the method, so I don't need the _else() block. I have noticed the IfExpression's evaluation method at EvaluationVisitor which also uses the JConditional . But that also doesn't mat

Re: How to generate hash code for each build side one of the hash join columns

2018-05-28 Thread Aman Sinha
sorry, the previous email is incomplete. For the ifBlock, do you need an _else() block also ? I have sometimes found that 'JConditional' is a good way to break down the logic further. Please see example usages of JConditional here [1]. -Aman [1] https://www.programcreek.com/java-api-examples/?

Re: How to generate hash code for each build side one of the hash join columns

2018-05-28 Thread Aman Sinha
Hi Weijie, It would be a little cumbersome to debug such issues over email since one has to look at the generated code output and iteratively debug. Couple of thoughts I have that might help: For this particular if-then block, should you also JBlock ifBlock = cg.getEvalBlock()._if(fieldIdParamHold

How to generate hash code for each build side one of the hash join columns

2018-05-28 Thread weijie tong
HI All: Through implementing the JPPD feature ( https://issues.apache.org/jira/browse/DRILL-6385) , I was blocked by the problem: how to get the hash code of each build side of the hash join columns through the dynamic generated java code. Hope someone can give some advice. I supposed to add