Paul Rogers created DRILL-7567:
----------------------------------

             Summary: Metastore enhancements
                 Key: DRILL-7567
                 URL: https://issues.apache.org/jira/browse/DRILL-7567
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Paul Rogers


The Metastore feature shipped as a Beta. Review of the documentation identified 
a number of opportunities for improvement before the feature leaves Beta.

* Should the Metastore be configured in its own file? Does this push us in the 
direction of each feature having its own set of config files? Or, should config 
move into the normal Drill config files?
* Provide a detailed schema and description of Metadata entities, like the Hive 
metadata schema.
* Provide an out-of-the-box sample Metastore for some of Drills demo tables.
* Provide a Metastore tutorial. Refer to the sample Metastore in the tutorial. 
Many folks learn best by trying things hands-on.
* Solve read/write consistency issues to avoid the need for the error/recovery 
described for {{metastore.metadata.fallback_to_file_metadata}}.
* Boot-time config is stored in the {{drill.metastore}} namespace. But, 
Metastore SYSTEM/SESSION options are in the {{drill.exec}} namespace. This is 
confusing. Let's be consistent.
* {{drill.exec.storage.implicit.last_modified_time.column.label}} is a bug: 
Drill internal names should never conflict with user-defined column names. 
Figure out where they conflict the issue. No user can ever guarantee that some 
name will never be used in their tables. Nor can users easily fix the issue if 
it occurs. (Note: this is a flaw with our implicit columns as well.)
* Provide a form of ANALYZE TABLE that automatically reuses settings from any 
previous run. It will otherwise be very user unfriendly for the user to have to 
find a place to store the ANALYZE TABLE command so that they can submit exactly 
the same one each time. In fact, experience with Impala suggests that end users 
will have no idea about schema, they just want the latest metadata. Such users 
won't even know the details of a command some other user might have submitted.
* The Iceberg metastore requires atomic rename. But, the most common use case 
for Drill today is the cloud. S3 does not support atomic rename. We need to fix 
this.
* The documentation says we us the "plugin name" as part of the table key. But, 
for DFS, say, the user can have dozens of plugin configs, each with a distinct 
name. Each can reuse the same workspace name of, say "foo". Thus "dfs/foo" is 
ambiguous. But, "hdfs1/foo", and "local/foo" are unique if we use storage 
plugin config names.
* It is not clear if the Iceberg metastore supports HDFS security and Kerberos 
tickets. If not, then it won't work in a production deployment.
* The metastore is meant to store schema. A key use is when schema is 
ambiguous. But, metastore gathers schema the same way that Drill queries 
tables. If schema is ambiguous, the ANALYZE TABLE will fail. Thus we do not 
actually solve the ambiguous schema problem. We need a solution.
* Better partition support. Drill has a long-standing usability issue that 
users must do their own partition coding. If I want data from 2018-11 to 
2019-02 (one quarter worth of data), I have to write the very ugly

{code:sql}
WHERE (dir0 = 2018 AND dir1 >= 11)
        OR (dir0 = 2019 AND dir1 <= 1)
{code}

With Hive/Impala/Presto I can just write:

{code:sql}
WHERE transDate IN ('2018-11-01', '2019-01-31')
{code}
* Allow staged gathering of stats. Allow me to first gather stats and review 
them for quality before I have my users start using them. As it is, there is no 
ability to gather them, enable the option for a session for testing, verify 
that things work right, then turn it on for everyone. That is, in a shared 
system, all heck can break loose in the current implementation.
* Review the internal Metastore tables. See many comments about the structure 
in the Metastore documentation PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to