Hi I have a directory of 387 Parquet files that amount to a single data set of 131Gb. Querying them with Drill works nicely. When I try to collect metadata for this table with
|analyze table columns none refresh metadata| that command uses a mind-boggling of amount of CPU time. At least the order of 10 CPU-hours and probably the order of 100 CPU-hours [1]. It cannot require that much CPU time to collect metadata from a few hundred Parquet files. Surely? I'd /like/ to collect statistics too for some columns but I've had to forgo that so far because of how slow this command is. [1] This is on a VMware guest with 10 vCPUs that are reported as Intel Xeon CPU E5-2690 v4 @ 2.60GHz
