Drill Hangout (2015-07-21) minutes

Parth Chandra Thu, 23 Jul 2015 11:47:19 -0700

Drill Hangout 2015-07-21

Participants: Jacques, Parth (scribe), Sudheesh, Hakim, Khurram, Aman,
Jinfeng, Kristine, Sean


Feature list for Drill 1.2 was discussed. The following items were
considered (disussion/ comments if any are summarized with each item):


   1.

   Memory allocator improvements  - Spillover from 1.1
   2.

   Faster reading of Hive parquet tables - Spillover from 1.1.
   3.

   Enhance/cleanup test framework & publish to community

The dev team has a set of tests that are a requirement for committing.
These tests should be made available so that community contributors can run
them independently.

   1.

   Additional window functions - Dev - NTILE, LEAD, LAG, FIRST_VALUE,
   LAST_VALUE
   2.

   Faster metadata read for 1000s of Parquet file
   3.

   Rowkey filter pushdown improvements
   4.

   Support "Insert into <T> Select From"

 A longish discussion on this. (I might have missed some points). Concerns
about how to maintain the previous table metadata, especially partition
metadata. There was a discussion around allowing a table to have files
where some files have different or no partitions from the partitions
defined when the table was first created. Suggestion to incorporate the
metadata in a .drill file.

Some more clarity to be provided about the functionality. More discussion
to continue in the JIRA.

   1.

   Support "Drop table"

A small discussion on some restrictions that will need to be imposed. In
particular, not allow access to root (/) and also that we should probably
validate that the table being dropped is, in fact, a table.

   1.

   Security for the WEB UI

Some considerations that we need to consider are whether we need to enable
SSL and some form of authentication/authorization for the web UI. Also
whether we need to consider the same for the REST API. A second question is
whether we should include the need to limit access to workspaces defined in
the web UI. One suggestion was whether we need to create workspaces similar
to views (i.e defined in a .drill file) and then use the same access
control mechanisms (i.e the one provided by the file system).

   1.

   JDBC driver
   2.

   Super bugs - flatten

  There was a discussion on whether we need to consider flatten to be an
instance of a User Defined Table Function. The issues we see in flatten are
related less to the flatten logic and more to handling edge cases of batch
boundaries, and vectors. The idea behind supporting UDTFs would be to write
the framework that handles the complexity of handling input and producing
output and the UDTF itself would need to implement an input and an output
operator. Flatten can then be reimplemented as a UDTF.

   1.

   Super bugs - convert function/
   2.

   Super bug - 2010: MergeJoin incorrect results. ( Suggestion that the
   solution might be to go the UDTF way)
   3.

   Super bug - DRILL-3121 Support interpreter based execution for hive
   partition pruning.

Drill Hangout (2015-07-21) minutes

Reply via email to