Greetings!
I'm a new user of Drill 1.9.0 under Windows 10 w/Java 1.8.0_121 (x64). I am
trying to configure drill-embedded to have more direct memory available to
it than the default 7GB I see when starting on my 32GB equipped
workstation. Uncommenting the DRILL_HEAP and DRILL_MAX_DIRECT_MEMORY
You can try the below things. For each of the below check the planning time
individually
1. Run explain plan for a simple "select * from `
/scratch/localdisk/drill/testdata/Cust_1G_tsv`"
2. Replace the '*' in your query with explicit column names
3. Remove the extract header from your storage
For explanation regarding why we are rebuilding the metadata cache, take a
look at Padma's previous email. Most likely, there is a data change in the
folder. If not we should refresh the metadata cache and its a bug.
Drill currently does not do incremental metadata refreshes. Now lets say
you
Can you please provide the drillbit.log file?
Gautam
From: PROJJWAL SAHA
Sent: Monday, March 6, 2017 1:45:38 AM
To: user@drill.apache.org
Subject: Fwd: Minimise query plan time for dfs plugin for local file system on
tsv file
all, please
Hi All
Any inputs on this?
Why creating metadata files recursively should took 1457445 ms when refresh
metadata on this path is already done?
Regards
Chetan
- -Original Message-
From: Jeena Vinod
Sent: Sunday, March 5, 2017 10:44 PM
To:
There is no need to refresh the metadata for every query. You only need to
generate the metadata cache once for each folder. Now if your data gets
updated, then any subsequent query you submit will automatically refresh
the metadata cache. Again you need not run the "refresh table metadata
"
Hi All
As I understand, we can trigger generation of the Parquet Metadata Cache File
by using REFRESH TABLE METADATA .
It seems we need to run this command on a directory, nested or flat, once
during the session.
Why we need to run for every session? That implies if I use REST API to
I can see both sides. But Ted is right, this won't hurt any thing from a
performance perspective, even if they put War and Peace in there 30 times,
that's 100mb of information to serve. People may choose to use formatting
languages like Markup or something. I do think we should have a limit so we
Have you tried disabling hash joins or hash agg on the query or changing
the planning width? Here are some docs to check out:
https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/
https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/
all, please help me in giving suggestions on what areas i can look into why
the query planning time is taking so long for files which are local to the
drill machines. I have the same directory structure copied on all the 5
nodes of the cluster. I am accessing the source files using out of the box
10 matches
Mail list logo