Configuring Drill Memory Usage under Windows

2017-03-06 Thread David F. Severski
Greetings! I'm a new user of Drill 1.9.0 under Windows 10 w/Java 1.8.0_121 (x64). I am trying to configure drill-embedded to have more direct memory available to it than the default 7GB I see when starting on my 32GB equipped workstation. Uncommenting the DRILL_HEAP and DRILL_MAX_DIRECT_MEMORY

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-06 Thread rahul challapalli
You can try the below things. For each of the below check the planning time individually 1. Run explain plan for a simple "select * from ` /scratch/localdisk/drill/testdata/Cust_1G_tsv`" 2. Replace the '*' in your query with explicit column names 3. Remove the extract header from your storage

Re: Explain Plan for Parquet data is taking a lot of timre

2017-03-06 Thread rahul challapalli
For explanation regarding why we are rebuilding the metadata cache, take a look at Padma's previous email. Most likely, there is a data change in the folder. If not we should refresh the metadata cache and its a bug. Drill currently does not do incremental metadata refreshes. Now lets say you

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-06 Thread Gautam Parai
Can you please provide the drillbit.log file? Gautam From: PROJJWAL SAHA Sent: Monday, March 6, 2017 1:45:38 AM To: user@drill.apache.org Subject: Fwd: Minimise query plan time for dfs plugin for local file system on tsv file all, please

RE: Explain Plan for Parquet data is taking a lot of timre

2017-03-06 Thread Chetan Kothari
Hi All Any inputs on this? Why creating metadata files recursively should took 1457445 ms when refresh metadata on this path is already done? Regards Chetan - -Original Message- From: Jeena Vinod Sent: Sunday, March 5, 2017 10:44 PM To:

Re: Metadata Caching

2017-03-06 Thread rahul challapalli
There is no need to refresh the metadata for every query. You only need to generate the metadata cache once for each folder. Now if your data gets updated, then any subsequent query you submit will automatically refresh the metadata cache. Again you need not run the "refresh table metadata "

Metadata Caching

2017-03-06 Thread Chetan Kothari
Hi All As I understand, we can trigger generation of the Parquet Metadata Cache File by using REFRESH TABLE METADATA . It seems we need to run this command on a directory, nested or flat, once during the session. Why we need to run for every session? That implies if I use REST API to

Re: Discussion: Comments in Drill Views

2017-03-06 Thread John Omernik
I can see both sides. But Ted is right, this won't hurt any thing from a performance perspective, even if they put War and Peace in there 30 times, that's 100mb of information to serve. People may choose to use formatting languages like Markup or something. I do think we should have a limit so we

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

2017-03-06 Thread John Omernik
Have you tried disabling hash joins or hash agg on the query or changing the planning width? Here are some docs to check out: https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/ https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/

Fwd: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-06 Thread PROJJWAL SAHA
all, please help me in giving suggestions on what areas i can look into why the query planning time is taking so long for files which are local to the drill machines. I have the same directory structure copied on all the 5 nodes of the cluster. I am accessing the source files using out of the box