Hi,
I also face the similar issue like JinFeng when querying the data on
columns year,month and day which were partioning column too .
It created lots of small files and querying took almost 20x times more than
reading non partiioning data .
Another issue I faced when I query the data by just f
Hello,
The metadata cache is well used. The issue is that most of the time is
spent in planning. It is not a huge amount of time (around 10s) but
that's seem a lot to handle 50k files.
The cardinality for key1 and key2 is around 300, so key1*key2 the number
of files in tens of thousands. But
I am facing problem while running use alluxio query after configuring alluxio
with Drill.
I am not able to connect to alluxio due to java.io.IOException: Frame size
(67108864) larger than max length (16777216)!
Any input on this will be useful.
Error Id: d4431c8b-d51f-4015-8be7-27a693252
If you have small cardinality for partitioning column, yet still end up
with 50k different small files, it's possible that you have many parallel
writer minor-fragment (threads). By default, each writer minor-fragment
will work independently. If you have cardinailty C and N writer minor
fragment,
Have you tried building metadata cache file using "refresh table metadata”
command ?
That will help reduce the planning time. Is most of the time spent in planning
or execution ?
Pruning is done at rowgroup level i.e. at file level (we create one file per
rowgroup).
We do not support pruning a
Hello,
I have a dataset that I always query on 2 columns that don't have a big
cardinality. So to benefit from pruning, I tried to partition the file
on these keys, but I end up with 50k differents small file (30Mo) and
query on it spend most of the time in the planning phase, to decode the
m