Hi Benj, Creating partitions as in your first example won't work. >From the docs: "During partitioning, Drill creates separate files, but not separate directories, for different partitions." ( https://drill.apache.org/docs/how-to-partition-data/). Also, Drill doesn't write additional metadata regarding partitioning, when it reads parquet files it determines partitions using min/max values. That means that if you want for example to partition using the first letter, you'll need to create a corresponding column. Or you can create partitions manually as directories.
On Wed, Dec 5, 2018 at 10:07 PM <[email protected]> wrote: > In would like to create a parquet with a partition on computed data > (without to have to put the result of the computation in the parquet) : > The goal is to optimize the parquet for typical expecting queries. > > Imaginary example : > CREATE TABLE `mytable` > PARTITION BY (substr(name,1,1)) AS > SELECT name, birthdate, birthcity > ORDER BY bithdate; > > So, if I do that I obtain a VALIDATION ERROR: Partition column ... is not > in the SELECT list of CTAS > > And the comment of the code of the function > "public static RelNode qualifyPartitionCol(RelNode input, List<String> > partitionColumns)" > confirms that it's not possible actually : > " A partition column is resolved, either (1) the same column appear in the > select list of CTAS or (2) CTAS has a * in select list" > > But what is the reason of this limitation ? > Is there exists any tricks to do it right now, or can we expect an > evolution to allow this possibilities. > > I just imagine to do (with the data of the example) > CREATE TABLE `mytable` > PARTITION BY (sname) AS > SELECT substr(name,1,1) sname, name, birthdate, birthcity > ORDER BY bithdate; > Then, next, request each partition file to remove the useless data > , like > CREATE TABLE `mytable_2/partition_x` > SELECT name, birthdate, birthcity > ORDER BY bithdate; > but it's not really satisfying... > > I would appreciate yours comments, > Regards, > > benj > -- Sincerely, Anton Gozhiy [email protected]
