You said there are 2144 parquet files but the plan suggests that you only have a single parquet file. In any case its a long time to plan the query. Did you try the metadata caching feature [1]?
Also how many rowgroups and columns are present in the parquet file? [1] https://drill.apache.org/docs/optimizing-parquet-metadata-reading/ - Rahul On Thu, Feb 23, 2017 at 4:24 PM, Jeena Vinod <[email protected]> wrote: > Hi, > > > > Drill is taking 23 minutes for a simple select * query with limit 100 on > 1GB uncompressed parquet data. EXPLAIN PLAN for this query is also taking > that long(~23 minutes). > > Query: select * from <plugin>.root.`testdata` limit 100; > > Query Plan: > > 00-00 Screen : rowType = RecordType(ANY *): rowcount = 100.0, > cumulative cost = {32810.0 rows, 33110.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1429 > > 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = > 100.0, cumulative cost = {32800.0 rows, 33100.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1428 > > 00-02 SelectionVectorRemover : rowType = (DrillRecordRow[*]): > rowcount = 100.0, cumulative cost = {32800.0 rows, 33100.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1427 > > 00-03 Limit(fetch=[100]) : rowType = (DrillRecordRow[*]): > rowcount = 100.0, cumulative cost = {32700.0 rows, 33000.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1426 > > 00-04 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=/testdata/part-r-00000- > 097f7399-7bfb-4e93-b883-3348655fc658.parquet]], selectionRoot=/testdata, > numFiles=1, usedMetadataFile=true, cacheFileRoot=/testdata, > columns=[`*`]]]) : rowType = (DrillRecordRow[*]): rowcount = 32600.0, > cumulative cost = {32600.0 rows, 32600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1425 > > > > I am using Drill1.8 and it is setup on 5 node 32GB cluster and the data is > in Oracle Storage Cloud Service. When I run the same query on 1GB TSV file > in this location it is taking only 38 seconds . > > Also testdata contains around 2144 .parquet files each around 500KB. > > > > Is there any additional configuration required for parquet? > > Kindly suggest how to improve the response time here. > > > > Regards > Jeena > > > > > > > > > > >
