Drill Hangout 2015-07-21 Participants: Jacques, Parth (scribe), Sudheesh, Hakim, Khurram, Aman, Jinfeng, Kristine, Sean
Feature list for Drill 1.2 was discussed. The following items were considered (disussion/ comments if any are summarized with each item): 1. Memory allocator improvements - Spillover from 1.1 2. Faster reading of Hive parquet tables - Spillover from 1.1. 3. Enhance/cleanup test framework & publish to community The dev team has a set of tests that are a requirement for committing. These tests should be made available so that community contributors can run them independently. 1. Additional window functions - Dev - NTILE, LEAD, LAG, FIRST_VALUE, LAST_VALUE 2. Faster metadata read for 1000s of Parquet file 3. Rowkey filter pushdown improvements 4. Support "Insert into <T> Select From" A longish discussion on this. (I might have missed some points). Concerns about how to maintain the previous table metadata, especially partition metadata. There was a discussion around allowing a table to have files where some files have different or no partitions from the partitions defined when the table was first created. Suggestion to incorporate the metadata in a .drill file. Some more clarity to be provided about the functionality. More discussion to continue in the JIRA. 1. Support "Drop table" A small discussion on some restrictions that will need to be imposed. In particular, not allow access to root (/) and also that we should probably validate that the table being dropped is, in fact, a table. 1. Security for the WEB UI Some considerations that we need to consider are whether we need to enable SSL and some form of authentication/authorization for the web UI. Also whether we need to consider the same for the REST API. A second question is whether we should include the need to limit access to workspaces defined in the web UI. One suggestion was whether we need to create workspaces similar to views (i.e defined in a .drill file) and then use the same access control mechanisms (i.e the one provided by the file system). 1. JDBC driver 2. Super bugs - flatten There was a discussion on whether we need to consider flatten to be an instance of a User Defined Table Function. The issues we see in flatten are related less to the flatten logic and more to handling edge cases of batch boundaries, and vectors. The idea behind supporting UDTFs would be to write the framework that handles the complexity of handling input and producing output and the UDTF itself would need to implement an input and an output operator. Flatten can then be reimplemented as a UDTF. 1. Super bugs - convert function/ 2. Super bug - 2010: MergeJoin incorrect results. ( Suggestion that the solution might be to go the UDTF way) 3. Super bug - DRILL-3121 Support interpreter based execution for hive partition pruning.