Understanding of Drill Architecture regd

Tamil selvan R.S Sat, 14 Feb 2015 07:22:13 -0800

Hi,
As the project description says, I understand drill as a open source
implementation of Dremel. Basically, Dremel optimizes adhoc queries on
unstructured data by storing it columnar way instead of record wise. I
assume drill doing the same. I saw drill supporting a wide variety of
datasources like json, mongo, etc., How does drill achieve the
transformation of source data into a columnar representation so that it can
optimize the queries?


For Example:
Data [Assume it to be in mongo]:
{"idtype":"ca","id":3,"metric":"purchases","time":"Y14/M0/D0","device":"nexus","devicegrp":"tablet","source":"minewhat","sourcegrp":"email","dofw":"weekend","tofd":"morning","browser":"chrome","engage":"return","location":"mumbai","locationgrp":"maharashtra","usertag":"frequent","search":"sony
tab","total":56263}

And for a query like below:
select test.device, count(*) from mongo.mydata test where test.idtype='b'
and test.id=10 group by test.device, test.idtype, test.id;

Will drill load *all documents* from mydata collection every time this
query is fired and later map the data to columnar style? I'm 100% sure this
won't be the implementation as it look to worsen the situation more
[loading data, transform [should go row by row] and then query the
transformed data].

It would be really helpful if someone can shed some light on this area, as
there is no material found in the documentation.

Regards,
Tamil.s

Understanding of Drill Architecture regd

Reply via email to