Re: Best way to partition the data

2017-09-01 Thread Jinfeng Ni
If you have small cardinality for partitioning column, yet still end up with 50k different small files, it's possible that you have many parallel writer minor-fragment (threads). By default, each writer minor-fragment will work independently. If you have cardinailty C and N writer minor fragment,

Re: System Error PhysicalOperatorSetupException: Can not parallelize fragment

2017-08-02 Thread Jinfeng Ni
Are you querying a new storage plugin (I saw the stack trace shows "indexR"), and that storage plugin is using HardAffinityFragmentParallelizer? As far as I know, the hard assignment is only used for distributed system tables, or some special operator (Screen) in Drill codebase. The majority cases

Re: Drill performance tuning parquet

2017-07-28 Thread Jinfeng Ni
The number you posted seems to show that the query elapse time is highly impacted by the number of scan minor fragments (scan parallelization degree). In Drill, scan parallelization degree is capped at minimum of # of parquet row groups, or 70% of cpu cores. In your original configuration, since

Re: predicate push down behavior

2017-07-27 Thread Jinfeng Ni
Your interpretation of the physical plan is correct. The "IS NOT NULL" predicate is pushed down to Oracle, while aggregation / sort / Limit is handled by Drill, although they could be pushed down as well. The JDBC storage plugin rules may need some enhancements. On Thu, Jul 27, 2017 at 5:15 AM,

Re: 1.11.0 RC question

2017-07-26 Thread Jinfeng Ni
Hi Bob, Is DRILL-5432 the one you are talking about? I saw it's merged and should have been put in the release candidate. What type of error did you see when you tried to query a PCAP? Also, it may help to provide the commit id of your build, by run the following query: SELECT * from

Re: Drill and Tableau

2017-07-25 Thread Jinfeng Ni
Some of questions you have are related to which type of storage you are querying with Drill. If you are querying hive table, Drill uses Hive metastore API to get the metadata, and internally maintain a cache for speedup [1]. If you are querying parquet files through file system plugin, Drill has

Re: Question about Drill aggregate queries and schema change

2017-07-25 Thread Jinfeng Ni
o be ignored downstream (to avoid any > unintended consequences)? > > Thanks for the help! > > On Mon, Jul 24, 2017 at 7:06 PM, Jinfeng Ni <j...@apache.org> wrote: > > > Based on my limited understanding of Drill's KuduRecordReader, the > problem > > seems to be

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Jinfeng Ni
>> is no error. I'll watch that jira expectantly because Kudu + Drill would be >> an awseome combo. But without the pruning it's useless to us. >> >> -Cliff >> >> On Mon, Jul 24, 2017 at 2:17 PM, Jinfeng Ni <j...@apache.org> wrote: >> >>> If yo

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Jinfeng Ni
If you see such errors only when you enable predicate pushdown, it might be related to a known issue; schema change failure caused by empty batch [1]. This happened when predicate prunes everything, and kudu reader did not return a RowResult with a schema. In such case, Drill would interprete the

Re: Error when timestamp IN clause contains more elements

2017-07-14 Thread Jinfeng Ni
The error would encounter when IN clause is rewritten into Join, and planner chooses to use distributed join (involves a Hash Exchange). For distributed join, when two join keys do not have same types, only certain type of implicit cast will be allowed. In Khurram's case, I think the query is

Re: Drill Query Engine for nested json or parquet

2017-07-03 Thread Jinfeng Ni
get to the root of this one? > > -Original Message- > From: Jinfeng Ni [mailto:j...@apache.org] > Sent: Wednesday, June 28, 2017 11:42 AM > To: user <user@drill.apache.org> > Subject: Re: Drill Query Engine for nested json or parquet > > You are right th

Re: Drill Query Engine for nested json or parquet

2017-06-28 Thread Jinfeng Ni
You are right that Q2 has to first apply flatten the field "details" before applying filter on top of the flattened rows. I do not think Drill currently supports to push filter down. The difference between Q1 and Q2: Q1 is querying a 4-row table, while Q2 is querying a 1-row table; the flatten

Re: Column alias are ignored when Storage Plugin is enabled

2017-06-09 Thread Jinfeng Ni
I feel DRILL-5577 is more likely to be related to DRILL-5538. The cause is an optimizer rule ProjectRemoveRule, which is added to query planner by JDBC storage plugin if enabled. I believe Arina is looking for a fix. On Fri, Jun 9, 2017 at 2:22 AM, Rahul Raj

Re: Partitioning for parquet

2017-05-31 Thread Jinfeng Ni
You may want to check if query on the second table is slower because of planning time or execution time. That could be determined by looking at the query profile in web-UI. Two factors might impact the planning time for second table having 11837: 1. Reading parquet metadata from those parquet

Re: Parquet filter pushdown and string fields that use dictionary encoding

2017-05-31 Thread Jinfeng Ni
Kunal is correct that Drill currently supports filter pruning at parquet row group level, using min/max statistics. Such support is limited to numeric/timestamp type, due to the potential corrupted varchar min/max issue as Kunal mentioned. For now Drill does not support dictionary-based pruning.

Re: [DRILL HANGOUT] Topics for 5/16/2017

2017-05-16 Thread Jinfeng Ni
We will start hangout shortly. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc On Mon, May 15, 2017 at 9:53 PM, Jinfeng Ni <j...@apache.org> wrote: > My feeling is that either temp table or putting 100k values into a > separate parquet files makes more sense

Re: [DRILL HANGOUT] Topics for 5/16/2017

2017-05-15 Thread Jinfeng Ni
> > Is there any better way to go around this problem or can we just solve this > problem with simple configuration changes ? > > Regards, > Jasbir Singh > > > -Original Message- > From: Jinfeng Ni [mailto:j...@apache.org] > Sent: Tuesday, May 16, 2017 2:

[DRILL HANGOUT] Topics for 5/16/2017

2017-05-15 Thread Jinfeng Ni
Hi All, Out bi-weekly Drill hangout is tomorrow (5/16/2017, 10AM PDT). Please respond with suggestion of topics for discussion. We will also collect topics at the beginning of handout tomorrow. Thanks, Jinfeng

Re: Field name is not expected when use `AS` in sql expression

2017-05-03 Thread Jinfeng Ni
Looks like this user mailing list does not allow attached file. Can you copy the output in text format and paste in email, so that people know what you are referring to? On Wed, May 3, 2017 at 7:53 PM, 温晨欣 wrote: > > > > As shown above, when using the `SUM` expression in

Re: Parquet, Arrow, and Drill Roadmap

2017-05-02 Thread Jinfeng Ni
> > - What the two readers are (is one a special drill thing, is the other a > standard reader from the parquet project?) > - What is the eventual goal here... to be able to use and switch between > both? To provide the option? To have code parity with another project? Both readers were for

Re: Apache Drill Query Planning Performance

2017-04-26 Thread Jinfeng Ni
Try to increase config `hive.metastore.cache-ttl-seconds`. The default is 60 seconds. I guess the reason you saw long planning time after another query is probably the cache from the first run expired. 1. https://drill.apache.org/docs/hive-metadata-caching/ On Wed, Apr 26, 2017 at 3:47 PM,

Re: Failed to query AQI files when some files have double quotes in lat/lng

2017-04-19 Thread Jinfeng Ni
You may try turn on this option `store.json.all_text_mode`, and then cast into numeric value in the query explicitly. alter session set `store.json.all_text_mode` = true; On Wed, Apr 19, 2017 at 8:49 AM, Dela Cruz, Vergel wrote: > Hi, > > > > I have a case where 2

Re: Drill Parquet Partitioning Method

2017-04-03 Thread Jinfeng Ni
That's a good idea. Let me clarify one thing first. Drill has two kinds of partitions: auto partition, or directory-based partition. The first one is a result of using drill's CTAS partition by statement [1]. Both partition column name and column value are written and encoded in the output

Re: [ANNOUNCE] Apache Drill 1.10.0 Released

2017-03-23 Thread Jinfeng Ni
nk you for your combined time & effort. It's greatly appreciated! > > On Thu, Mar 16, 2017 at 12:53 PM, Jinfeng Ni <j...@apache.org> wrote: >> Thanks Abhishek for the reminder. >> >> I just pushed 1.10.0 branch to Github. >> >> On Thu, Mar 16, 2017 at 9:

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

2017-03-21 Thread Jinfeng Ni
Very interesting findings, Francois. Thanks for sharing them with the community. The change of max_per_node and affinity_factor seems to reduce the possibility of one drillbit was hitting overload issue because of either CPU or Network contention. In our in-house testing, we also noticed that

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

2017-03-21 Thread Jinfeng Ni
Very interesting findings, Francois. Thanks for sharing them with the community. The change of max_per_node and affinity_factor seems to reduce the possibility of one drillbit was hitting overload issue because of either CPU or Network contention. In our in-house testing, we also noticed that

Re: Issue with Retrieving Lat/Long from Map

2017-03-17 Thread Jinfeng Ni
The flatten given you access of the list of locations. You can use the following to put it into two columns: SELECT DateTime, TaxCount, Location[0] as Latittude, Locations[1] as Longtitude FROM ( SELECT tbl.features[0].properties.`timestamp` as DateTime, tbl.features[0].properties.taxi_count as

Re: [ANNOUNCE] Apache Drill 1.10.0 Released

2017-03-16 Thread Jinfeng Ni
h creation still pending? > > -Abhishek > ________ > From: Jinfeng Ni <j...@apache.org> > Sent: Thursday, March 16, 2017 8:40:07 AM > To: dev; user > Subject: [ANNOUNCE] Apache Drill 1.10.0 Released > > On behalf of the Apache Drill community, I

Re: Reg: Column name is not passed to RDBMS

2017-03-16 Thread Jinfeng Ni
Can you please first check the EXPLAIN PLAN output for your query? The first step is to figure out whether it's planner issue, or execution issue. On Thu, Mar 16, 2017 at 7:39 AM, Nishith Kumar Nayak wrote: > Hi, > > While trying to run SQL query for RDBMS source like

[ANNOUNCE] Apache Drill 1.10.0 Released

2017-03-16 Thread Jinfeng Ni
On behalf of the Apache Drill community, I am happy to announce the release of Apache Drill 1.10.0. For information about Apache Drill, and to get involved, visit the project website [1]. This release introduces new features and enhancements, including CREATE TEMPORARY TABLE AS command, Kerberos

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-09 Thread Jinfeng Ni
DRILL-5089 has hit many drill users for very long time; it was reported several times on this drill user list. It would make sense to bump it's priority and get it fixed in the next Drill release. On Wed, Mar 8, 2017 at 8:45 AM, Zelaine Fong wrote: > The slowness you’re seeing

Re: Dealing with bad data when trying to do date computations

2017-02-28 Thread Jinfeng Ni
sys.functions using Drill. On Tue, Feb 28, 2017 at 9:02 AM, Jinfeng Ni <j...@apache.org> wrote: >> 4. I think as part of developer review and pull requests that add >> functions/functionality should require a pull request to also provide a >> documentation upda

Re: Dealing with bad data when trying to do date computations

2017-02-28 Thread Jinfeng Ni
> 4. I think as part of developer review and pull requests that add > functions/functionality should require a pull request to also provide a > documentation update. This helps to ensure that the docs keep up to date, > as well as keeping users appraised of what is happening... i.e. it's a good >

Re: Excessive Logging Messages

2017-02-26 Thread Jinfeng Ni
use low-case ones. I tried the above approach, and it seemed to work. On Sun, Feb 26, 2017 at 8:40 PM, Charles Givre <cgi...@gmail.com> wrote: > Hi Jinfeng, > That didn’t seem to do the trick. Any other suggestions? > >> On Feb 26, 2017, at 23:23, Jinfeng Ni <j...@apache.

Re: Excessive Logging Messages

2017-02-26 Thread Jinfeng Ni
You can modify logback.xml in your drill conf directory (DRILL_HOME/conf/logback.xml). --> Change INFO to either WARN or ERROR On Sun, Feb 26, 2017 at 7:17 PM, Charles Givre wrote: > Hello all, > I was doing some unrelated work and

Re: Explain Plan for Parquet data is taking a lot of timre

2017-02-23 Thread Jinfeng Ni
The reason the plan shows only one single parquet file is because "LIMIT 100" is applied and filter out the rest of them. Agreed that parquet metadata caching might help reduce planning time, when there are large number of parquet files. On Thu, Feb 23, 2017 at 4:44 PM, rahul challapalli

Re: Drill query planning taking a LONG time

2017-02-17 Thread Jinfeng Ni
end and not something more mysterious happening in Drill itself. > > Thanks, > > Dave > > On Wed, Feb 15, 2017 at 12:37 PM, Jinfeng Ni > <j...@apache.org<mailto:j...@apache.org>> wrote: > > Can you help try one more thing if you can? > > Run jstack on t

Re: Drill query planning taking a LONG time

2017-02-15 Thread Jinfeng Ni
il.com> wrote: > I ran that EXPLAIN that you suggested against the entire 100 file table and > it takes about 3 seconds. I will try to get a defect written up in the next > few days. > > - Dave > > On Tue, Feb 14, 2017 at 9:06 PM, Jinfeng Ni <j...@apache.org> wrote: &g

Re: Drill query planning taking a LONG time

2017-02-14 Thread Jinfeng Ni
hat. What does that mean to you > all? > > - Dave > > On Tue, Feb 14, 2017 at 12:37 PM, Jinfeng Ni <j...@apache.org> wrote: > >> Normally, the slow query planning could be caused by : >> >> 1. Some planner rule hit a bug when processing certain operato

Re: Drill query planning taking a LONG time

2017-02-14 Thread Jinfeng Ni
What are my next steps? Shall I create a defect in the Drill Jira? > > - Dave > > On Mon, Feb 13, 2017 at 5:13 PM, Jinfeng Ni <j...@apache.org> wrote: > >> The size of parquet files will matter in terms of meta data access >> time, which is just 212 ms according to

Re: Drill query planning taking a LONG time

2017-02-13 Thread Jinfeng Ni
ong in the planning phase for a > query? Is there anything else I can try or information I could provide to > help identify the bug (seems like a bug to me)? I really appreciate you > guys helping out so quickly this afternoon. > > - Dave > > On Mon, Feb 13, 2017 at 4:13 PM, Jinfeng

Re: Drill query planning taking a LONG time

2017-02-13 Thread Jinfeng Ni
Yes, the log confirmed that the planning, especially physical planning, is the one that took most of the time. If the definition of view s3.cisexport.transactionView is not very complicated (involves large # of tables), then it's possible that some planner rules have a bug. (In the past, we once

Re: Drill UDF input - pass variable list of strings

2017-02-09 Thread Jinfeng Ni
AFAIK, Drill currently does not allow variable size of parameters in UDF in general (built-in function concat is specially handled internally). Someone once intended to add such support. Looks like that work has not been completed yet. On Thu, Feb 9, 2017 at 6:51 AM, Sandeep Dugar

Re: Issue with drill query

2017-02-07 Thread Jinfeng Ni
Can you file a Drill JIRA? From the stack trace and the relevant code, seems there is a bug in HiveMetadataProvider. sizeInBytes = Long.valueOf(numRowsProp) ==> sizeInBytes = Long.valueOf(sizeInBytesProp); 1.

Drill Hangout 2/7/2017

2017-02-06 Thread Jinfeng Ni
Hi drillers, We are going to have Drill Hangout tomorrow (02/07/2017, 10 AM PT). If you have any suggestions for hangout topics, you can add them to this thread. We will also ask around at the beginning of the hangout for topics. Thank you, Jinfeng

Re: IndexR, a new storage plugin for Drill

2017-01-03 Thread Jinfeng Ni
Forward to drill dev list. People on dev list might be interested in this as well. On Tue, Jan 3, 2017 at 8:22 AM, Jinfeng Ni <j...@apache.org> wrote: > Looks like IndexR is very interesting storage plugin. Although I have > not looked into the detail, I'm looking forward to s

Re: IndexR, a new storage plugin for Drill

2017-01-03 Thread Jinfeng Ni
Looks like IndexR is very interesting storage plugin. Although I have not looked into the detail, I'm looking forward to seeing the PR and hopefully getting this into Drill! Thanks, Jinfeng On Tue, Jan 3, 2017 at 7:30 AM, WeiWan wrote: > Hi Charles, > > It would be great

Re: Performance degradation for UNION ALL parquet data sources

2016-12-08 Thread Jinfeng Ni
Can you please check the Explain plan output for the original query and the query against view, and see if there is any difference in the two query plans? The difference might be caused by UNION ALL operator, which might lead to different parallelization mode. On Thu, Dec 8, 2016 at 9:08 AM,

Re: SYSTEM ERROR: CompileException

2016-11-11 Thread Jinfeng Ni
Is this same issue as DRILL-4971? https://issues.apache.org/jira/browse/DRILL-4971 On Fri, Nov 11, 2016 at 10:53 AM, rahul challapalli wrote: > This is a bug and its weird that changing the literal in the condition > makes it work. Can you go ahead and raise a jira

Re: Drill Join query optimization

2016-11-04 Thread Jinfeng Ni
I'm not familiar with Mongo storage plugin. But looking at the code, seems there is a MongoPushDownFilterForScan rule, which should be able to push the filter ta.Id ='123' into TableA. If your query plan does not show this filter is pushed down into Mongo Scan operator, there might be a bug in

Re: Drill Hangout will start in couple of minutes.

2016-10-18 Thread Jinfeng Ni
to > join the video call... > > Kind regards > Arina > > On Tue, Oct 18, 2016 at 8:22 PM, Jinfeng Ni <j...@apache.org> wrote: > >> I copied the link from Drill webpage [1]. >> >> There was a connection issue in the middle of today's short hangout.

Re: Drill Hangout will start in couple of minutes.

2016-10-18 Thread Jinfeng Ni
com> wrote: > Jinfeng, > > is link correct? Can't join video call. > > Kind regards > Arina > > On Tue, Oct 18, 2016 at 7:55 PM, Jinfeng Ni <j...@apache.org> wrote: > >> Link : https://hangouts.google.com/hangouts/_/maprtech.com/ >> drillbi-weeklyhangout >> >> >> Thanks, >> Jinfeng >>

Drill Hangout will start in couple of minutes.

2016-10-18 Thread Jinfeng Ni
Link : https://hangouts.google.com/hangouts/_/maprtech.com/drillbi-weeklyhangout Thanks, Jinfeng

Topics for next Drill hangout (10/18/2016)

2016-10-17 Thread Jinfeng Ni
Hi everyone, The next Drill hangout is tomorrow, Oct 18 2016, 10:00AM PDT. If you have any suggestions for hangout topics, you can add them to this thread. You can always join and bring up new topics at the last minute, as we will collect the topics at the beginning of hangout. Thank you,

Re: IN operator can take how many inputs ?

2016-10-07 Thread Jinfeng Ni
; I am in 1.6 Drill version with mapr distribution 5.1. >> I get this error : >> Error: VALIDATION ERROR: The option 'planner.in_subquery_threshold' does >> not exist >> >> >> Le lun. 3 oct. 2016 à 17:18, Jinfeng Ni <j...@apache.org> a écrit : >> >> > Yo

Re: IN operator can take how many inputs ?

2016-10-03 Thread Jinfeng Ni
You can modify option `planner.in_subquery_threshold`. By default, it's set to be 20. That's the threshold when planner decides to convert IN-list to a subquery. select * from sys.options where name like '%in_subquery%';

Re: UDF IndexOutOfBoundsException for large String

2016-09-26 Thread Jinfeng Ni
It might be related to the way you allocate buffer for your text input in your UDF. It would be helpful if you can share the UDF code to help people understand the problem. On Mon, Sep 26, 2016 at 9:55 AM, Sandeep Dugar wrote: > Hi , > > I am trying to write a UDF for

Re: LIMIT push down to parquet row group

2016-09-19 Thread Jinfeng Ni
Drill applies LIMIT filtering at row group level. For LIMIT n, it will scan the first m row groups that have at least n rows, and discard the rest of row groups. In your case, since you have only 1 row group, it does not have any row group filtering for LIMIT 1. I'm not sure how 32767 comes

[ANNOUNCE] Apache Drill 1.8.0 released

2016-08-31 Thread Jinfeng Ni
On behalf of the Apache Drill community, I am happy to announce the release of Apache Drill 1.8.0. This release of Drill fixes 47 JIRA issues and introduces a number of enhancements, including metadata cache pruning, DESCRIBE SCHEMA command, multi-byte delimiter support, and launch script

Re: Drill Queries Timing Out

2016-08-31 Thread Jinfeng Ni
Can you please check the query profiles for both cases, and see if the query plans are different? With more nodes, it's possible that the same query might use different plan. Is your query like "select count(*) from bigTable"? What is the storage plugin/format for the bigTable? On Wed, Aug 31,

Re: Facing issue with drill web UI.

2016-08-26 Thread Jinfeng Ni
That's probably because your login is not an admin user, or does not belong to a group having the admin authorization. Take a look at [1] [1] https://drill.apache.org/docs/configuring-web-console-and-rest-api-security/ On Fri, Aug 26, 2016 at 12:10 AM, Sonali Ghorpade

Re: show tables taking long time on hive plugin

2016-08-18 Thread Jinfeng Ni
You are right that there is improvement room for "Show table" command. Today, Drill process "Show table" as a query against INFORMATION_SCHEMA.TABLE WHERE schema = 'Hive_schema', which will not only get the table names but also table instances. In fact, "Show Table" should only return table

Re: EMR + Error closing operators

2016-08-18 Thread Jinfeng Ni
) >> ~[drill-java-exec-1.6.0.jar:1.6.0] >> at >> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251) >> ~[drill-java-exec-1.6.0.jar:1.6.0] >> at java.security.AccessController.doPrivileged(Native Method) >> ~[na:1.7.0_71] >

Re: Drill 1.6 Cartesian or an inequality joins are not supporting ?

2016-08-05 Thread Jinfeng Ni
AFAIK, Drill does not support Cartesian join for now. For inequality join, as long as the join condition has at least one equality condition, plus inequality condition, Drill breaks into an equality join and a filter, and it should work fine. For example, Join ( T1.C1 = T2.C2 and T1.C3 > T2.C4)

Re: Hbase Prefix Filter

2016-07-19 Thread Jinfeng Ni
Will 'like' predicate work for you? Take a look at the example in Drill unit test [1] [1] https://github.com/apache/drill/blob/master/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java#L464 On Tue, Jul 19, 2016 at 2:37 AM, ankit beohar

Drill Hangout

2016-07-12 Thread Jinfeng Ni
We will start Drill Hangout soon: https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Drill Hangout Topics for 07/12/2016

2016-07-11 Thread Jinfeng Ni
Hey, Just a reminder that we will have Drill hangout at 10am-11am, 07/12/16 PST. If you have any topics to discuss on tomorrow's hangout, please reply this thread and list the topics you want to discuss. I'll also collect topics at the beginning of the hangout. Thank you, Jinfeng

Re: Critical Bug with Column Name Clash

2016-06-29 Thread Jinfeng Ni
Can you please run EXPLAIN PLAN FOR query_statement, and post the plan for your query here? That will help us figure out whether something is wrong with the query planner or something else. I tried a similar query against file system storage plugin, and did not see the problem you experienced.

Re: Merging files

2016-06-23 Thread Jinfeng Ni
thing > from table1 a LEFT JOIN table2 b on a.id = b.id > where a.field1 = '2015-05-05' > group by a.field1, b.field2 > > It's not a very complicated query, but it doesn't like the hash_distribute > :) > > On Thu, Jun 23, 2016 at 1:46 PM, Jinfeng Ni <jinfengn...@gmail.com>

Re: Is this normal view behavior?

2016-06-23 Thread Jinfeng Ni
Tried on a commit on 1.7.0-SNAPSHOT. Looks like I could not re-produce the problem. Which version are u using? create view dfs.tmp.myview as select dir0 as p_day, l_partkey, l_orderkey, l_suppkey from dfs.tmp.t2; +---+-+ | ok |

Re: Merging files

2016-06-23 Thread Jinfeng Ni
is is Drill 1.6, so maybe the issue isn't fixed in my version? > do you know the original JIRA for the really long error? > > On Thu, Jun 23, 2016 at 12:56 PM, Jinfeng Ni <jinfengn...@gmail.com> wrote: > >> This "CannotPlanException" definitely is a bug in qu

Re: Merging files

2016-06-23 Thread Jinfeng Ni
s all the verbose errors I've had) > I've ever seen. I'd post it here, but I am not unsure if my Google has > enough storage to handle this message > > (kidding... sorta) > > John > > > > On Thu, Jun 23, 2016 at 12:37 PM, Jinfeng Ni <jinfengn...@gmail.com> wrote:

Re: Merging files

2016-06-23 Thread Jinfeng Ni
Do you partition by day in your CTAS? If that's the case, CTAS will produce at least one parquet file for each value of "day". If you have 100 days, then you will end up at least 100 files. However, in case the query is executed in distributed mode, there could be more than one file per value.

Re: apache-drill-1.6.0: failed to connect to hdfs 1.0.4

2016-06-16 Thread Jinfeng Ni
The error seems to be thrown by hdfs, when Drill tries to get list of file status. This might be caused by incompatibility of hdfs version. Per Drill's doc [1] , HDFS 2.3 + API is required. Did you say you are running on hdfs 1.0.4? " Hadoop: All Hadoop distributions (HDFS API 2.3+), "

Re: Pivot in Apache Drill

2016-06-13 Thread Jinfeng Ni
Drill currently does not support Sql Pivot. Seems this is not a functionality defined in SQL standard ISO/SEC9075-2. The SQL planner that Drill uses, Apache Calcite, also does not have support of this syntax. If this is a very commonly used functionality, we may consider adding support in future.

Re: How to extend system or session options

2016-06-13 Thread Jinfeng Ni
Are you talking about the configuration property for storage plugin? You can add whatever configuration to storage plugin, but it's storage plugin's job to parse and recognize the configuration. For example, hive storage plugin may have: { ... "hive.metastore.sasl.enabled": "false"

Re: rewriting table for joining logical partitions

2016-06-02 Thread Jinfeng Ni
type that rule/join to not just be the > logical so it can convert only this generated base case. > > Thanks! > --Jesse > > ps. Updated the gist > <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-drill-to-rel-table-example-java> > to cover the offset cha

Re: rewriting table for joining logical partitions

2016-06-01 Thread Jinfeng Ni
I'm not sure if I understand your problem correctly. Are you trying to build some non-SQL interface on top of Drill, to join a set of dynamic tables? Can you give more concrete example? When Drill handles join over two dynamic tables, except for * column query, the dynamic tables have a list of

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2016-05-25 Thread Jinfeng Ni
Big congratulations, Parth! Thank you, Jacques, for your contribution and leadership over the last few years! On Wed, May 25, 2016 at 8:35 AM, Jacques Nadeau wrote: > I'm pleased to announce that the Drill PMC has voted to elect Parth Chandra > as the new PMC chair of

Re: "user" as a reserved word

2016-05-24 Thread Jinfeng Ni
iding the current drill user, and if we can make it align >> with out SQL systems (like how Postgres behaves with ITS quoted identifier) >> than I think we have a compelling case for changing how drill reacts here. >> >> Thoughts? >> >> (Once again, thanks for c

Re: "user" as a reserved word

2016-05-23 Thread Jinfeng Ni
> wrote: > >> Hmm, you are correct, I don't have to like it :) but there is both logic >> and precedence here. Thanks for following up >> >> John >> >> On Monday, May 23, 2016, Jinfeng Ni <jinfengn...@gmail.com> wrote: >> >> > An quoted id

Re: "user" as a reserved word

2016-05-23 Thread Jinfeng Ni
current setup there is lots of possibility for assumptions on calling back > tick user back tick and without an error users may have wrong, but "error" > free results. > On May 23, 2016 4:54 PM, "Jinfeng Ni" <jinfengn...@gmail.com> wrote: > >> The problem

Re: "user" as a reserved word

2016-05-23 Thread Jinfeng Ni
The problem here is that identifier 'user' is not only a reserved word, but also represents a special function == current_user() call. The identifier 'user', whether it's quoted or not, could mean either column name or the function call. Without the table alias, it could be ambiguous to sql

Re: Multiple Column having same name & same data type problem

2016-04-05 Thread Jinfeng Ni
Can you please post the explain plan output? EXPLAIN PLAN FOR query; My understanding is that Drill should return ID, ID0, in case there are naming conflicts in the output columns. If it's not working that way, that's a bug and we need figure out what went wrong. thanks. On Mon, Apr 4,

Re: Quick Query help

2016-04-01 Thread Jinfeng Ni
This seems to be a bug in the query planner. I could re-produce this problem, with similar query, even on 1.4 and 1.5. select rtrim(s.name2) as name3 from ( select ltrim(n_name) as `name2`, count(*) as num_creates from cp.`tpch/nation.parquet` group by ltrim(n_name) order by name2

Re: Drill indexes

2016-03-30 Thread Jinfeng Ni
Drill is mainly a SQL query engine, and it has very limited support for DDL (only has CTAS for parquet files). As such, Drill does not have the ability to create index or unique constraint. However, if your underneath storage is a RDBMS, you could put the index / unique constraint in RDBMS, and

Re: Making Query to Remote MongoDB Faster

2016-03-01 Thread Jinfeng Ni
If the query time is mainly spent on query planning, then running distributed mode in a cluster will not help shorten the latency, as the query planning is done on Foreman, which is just a single node in the cluster. Can you please try to query a mongodb database with fewer collections? >From the

Re: One single query for more files JSON

2016-02-16 Thread Jinfeng Ni
Could you please turn on verbose error mode by running the following [1] then re-run the query and post the error message, or copy & post the drillbit log when the query was executed? ALTER SESSION SET `exec.errors.verbose` = true; That will give us more information about the error you saw.

Re: querying json with arrays of varying dimensionality fails

2016-02-15 Thread Jinfeng Ni
pe: BIGINT > mode: OPTIONAL > ], actual type: [minor_type: UNION > mode: OPTIONAL > sub_type: BIGINT > sub_type: LIST > ] > > Also it fails as before (unsupported operation) in udf where I'm using > JsonReader and ComplexWriter to read json from string. > Thank yo

Re: querying json with arrays of varying dimensionality fails

2016-02-12 Thread Jinfeng Ni
For such use case, you need to turn on union vector type support (https://issues.apache.org/jira/browse/DRILL-3229) 0: jdbc:drill:zk=local> alter session set `exec.enable_union_type` = true; +---+--+ | ok | summary |

Re: Query Planning and Directory Pruning

2016-02-09 Thread Jinfeng Ni
Hi John, I think the patch for DRILL-2517 [1] would help a bit, if you use parquet files. DRILL-2517 would save the overhead of reading parquet metadata from parquet footer, by pruning the directory first. (It list some preliminary performance results, similar to the setup you had ) However,

Re: Avro reader - Possible regression in 1.5-SNAPSHOT

2016-02-02 Thread Jinfeng Ni
Hi Stefán, Can you post the commit id for your build, by run: select * from sys.version; The unit test has similar query (select * from avro_table) [1], and did not hit this problem. I just want to make sure that we are using the same commit id. Thanks [1]

Re: Dynamic Schema Discovery is not done in case of Drill- Hive

2016-01-19 Thread Jinfeng Ni
Drill uses cache in its HiveMetaStoreClient, in order to reduce the overhead to access HiveMetaStore. By default, the cache TTL is 60 seconds [1]. If you want to make the cache TTL shorter, you could configure hive storage plugin. The following example reduces TTL to 5 seconds. See [2]

Re: Classpath scanning & udfs

2016-01-11 Thread Jinfeng Ni
If what Rahul described is true, I think at minimum we should document the backward compatible issue in Drill new release doc. For today, drill doc seems to still use the old way [1] [1] https://drill.apache.org/docs/adding-custom-functions-to-drill/ On Mon, Jan 11, 2016 at 10:17 AM, rahul

Re: Drill Query Problem

2015-12-17 Thread Jinfeng Ni
@Nirav, I tried your query on the sample dataset you sent, with the latest Drill master branch. The query is successful, either as a standalone query, or embedded in a CTAS. Can you please double check if you have JDK install on "every" node in your drill cluster? The run-time code generation

Re: Drill Query Problem

2015-12-08 Thread Jinfeng Ni
@Nirav, If possible, could you please post a small sample dataset? For the compilation problem of run-time generated code, the thing matters is the input column type. It would be nice if you could share some sample data. Also, which version of Drill are you using? I tried to use the following

Re: Drill Query Problem

2015-12-04 Thread Jinfeng Ni
ould test this out in various > environments and make sure that the action to take is clear in the > exception messages. > > https://issues.apache.org/jira/browse/DRILL-1919 > > On Fri, Dec 4, 2015 at 11:01 AM, Jinfeng Ni <jinfengn...@gmail.com> wrote: > >> You m

Re: Drill Query Problem

2015-12-04 Thread Jinfeng Ni
You may consider switching the java_compiler from DEFALT to JDK, by : alter session set `exec.java_compiler` = 'JDK'; All Drill will automatically switch fro janino compiler to JDK, when the source code length is beyond certain limit. If your query happens to have smaller code than that limit,

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread Jinfeng Ni
For each column in the parquet files, Drill will check column metadata and see if min == max across all parquet files. If yes, that indicates this column has a unique value for all the files, and Drill will use that column as partitioning columns. The partitioning column could be a column

Re: Drill WITH clause syntax

2015-09-23 Thread Jinfeng Ni
Is it something related to the json file you used? I tried the following query, replacing "business.json' with cp.`employee.json`. It works fine. (no row returned, which is expected). 0: jdbc:drill:zk=local> WITH X1 . . . . . . . . . . . > AS . . . . . . . . . . . > (SELECT city, . . . . . . . .

  1   2   >