New Drillbits joining cluster causes severe performance spike

2015-04-21 Thread Adam Gilmore
Hey guys, I'm troubleshooting some issues with our cluster under some production load and scaling. If we add new drillbits to a cluster, as soon as it joins the cluster, performance degrades severely (queries that usually take 1s would take 60s, for example). After a few minutes, it recovers jus

Re: Documentation for Query Profile page in Web UI

2015-04-21 Thread Alexander Zarei
And thanks very much Andries for the detailed information answering to my questions. I really appreciated it. And the tables are stored in HDFS on the EMR cluster, not on S3, and then loaded into Hive as External tables. Thanks, Alex On Tue, Apr 21, 2015 at 3:08 PM, Andries Engelbrecht < aeng

Re: Query performance Comparison - Drill and Impala

2015-04-21 Thread Hao Zhu
Correction for item 1: Change "FAA" => "FCD1". On Tue, Apr 21, 2015 at 3:11 PM, Hao Zhu wrote: > Hi Team, > > Besides the 300+ seconds planning time, here are the performance > differences in execution phase also. > > In general, this SQL contains 2 fact table joins, and with another 10+ > dime

Re: Query performance Comparison - Drill and Impala

2015-04-21 Thread Hao Zhu
Hi Team, Besides the 300+ seconds planning time, here are the performance differences in execution phase also. In general, this SQL contains 2 fact table joins, and with another 10+ dimension table joins. The 2 fact tables are: - fact_agent_activity_detail_12m_partparq AS FAA - fact_contac

Re: Documentation for Query Profile page in Web UI

2015-04-21 Thread Andries Engelbrecht
Alex, Definitely looks like the majority of time is by far spend on reading the Hive data (Hive_Sub_Scan). Not sure how well the storage environment is configured, and it may very likely be that the nodes are just waiting on storage IO. The more nodes will simply just wait longer to actually ge

Re: Documentation for Query Profile page in Web UI

2015-04-21 Thread Alexander Zarei
Sorry about the inconvenience. The Web UI output is printed in a PDF file here: https://drive.google.com/file/d/0B24zVBhi8pQ3aDRfRllFVUh2eEE/view?usp=sharing Thanks, Alex On Tue, Apr 21, 2015 at 2:40 PM, Jason Altekruse wrote: > The attachment for the json profile made it to the list because i

Re: Documentation for Query Profile page in Web UI

2015-04-21 Thread Jason Altekruse
The attachment for the json profile made it to the list because it is ASCII, but the screenprint was blocked as a binary file. We can take a look at the profile by loading the json into an instance of Drill, but just a reminder about binary attachments for everyone, please upload to a public host a

Documentation for Query Profile page in Web UI

2015-04-21 Thread Alexander Zarei
Hi Team Drill! While performing performance testing on Drill clusters on AWS EMR, with TPC-H data of scale factor 100, I observed the results for a cluster of 3 nodes are similar to a cluster of 13 nodes. Hence, I am investigating how the query is being carried out and which part of the query ha

Re: Query performance Comparison - Drill and Impala

2015-04-21 Thread Jinfeng Ni
The query you used is a 15-table join query. We know that Drill's cost based optimizer will see performance overhead increase significantly with increased # of tables joined, due to the increased search space. I'm not surprised to see that you had 306.812 seconds for planning for such 15 table joi

RE: Query performance Comparison - Drill and Impala

2015-04-21 Thread Sivasubramaniam, Latha
Hao, I have copied both query profiles to the link https://drive.google.com/folderview?id=0ByB1-EsAGxA8fkRCSDhVcGtQS0NGTjRuSlpGelVLMkxIVVFxYXMtU2JtQ3FaN2t3UTZpUUE&usp=sharing Please let me know if you cannot access. Appreciate any help. Thanks, Latha -Original Message- From: Hao Zhu [

RE: Query performance Comparison - Drill and Impala

2015-04-21 Thread Sivasubramaniam, Latha
Hao, Thanks for checking on this. I will try to get both the profiles as soon as I can. I am using the same test bed, I need to switch it back to Impala. Thanks, Latha -Original Message- From: Hao Zhu [mailto:h...@maprtech.com] Sent: Tuesday, April 21, 2015 11:31 AM To: user@drill.apache

Re: Query performance Comparison - Drill and Impala

2015-04-21 Thread Hao Zhu
Hi Latha, Do you have the complete SQL profiles for the same query on impala and Drill? For Impala, you can run "profile;" command after the SQL finished. For Drill, you can go to the web GUI, and copy/paste the complete "Full JSON Profile". I want to compare what is the major performance differe

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Jacques Nadeau
If you want pushdown, you'd either need to add a new optimizer rule that understood your function or use a function that support udf. On Tue, Apr 21, 2015 at 9:24 AM, Christopher Matta wrote: > Thanks Jacques, I'll take a look. In this case would predicate queries end > up doing full scans of th

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Ted Dunning
IF you blow apart the data as a list, you have scanned the data. Flattening from there will give you sample per row representation. There won't be any pushdown of filtering into the UDF, but this should be really, really fast anyway. On Tue, Apr 21, 2015 at 12:24 PM, Christopher Matta wrote: >

RE: Query performance Comparison - Drill and Impala

2015-04-21 Thread Sivasubramaniam, Latha
Neeraja, I removed some columns from select list and shortened aliases to get the query manageable and got the explain plan working finally. The errors were really random. Jacques, Most of it seems to be Hashjoin and yes the planning time took 306 seconds. Is there a way to improve the query

Re: Drill and Python

2015-04-21 Thread Jim Bates
I was using a macbook with an iPython notebook. Below is the info I used on mac to set up the DSN Here is an example config for a zookeeper connection: [ODBC Data Sources] Drill 0.8 sandbox = MapR Drill ODBC Driver [Drill 0.8 sandbox] Driver = /opt/mapr/drillodbc/lib/universal/libmaprdrillodbc.d

Community Hangout happening now

2015-04-21 Thread Abdel Hakim Deneche
Feel free to join the hangout: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc Thanks! -- Abdelhakim Deneche Software Engineer Now Available - Free Hadoop On-Demand Training

Re: Drill and Python

2015-04-21 Thread Charles Givre
Thanks Chris. That's exactly what I was looking for. On Tue, Apr 21, 2015 at 12:45 PM, Christopher Matta wrote: > Charles, I have an iPyghon notebook using PyODBC: > https://github.com/cjmatta/drill_ipython_notebook > > On Tuesday, April 21, 2015, Charles Givre wrote: > >> All, >> Forgive the

Re: Drill and Python

2015-04-21 Thread Christopher Matta
Charles, I have an iPyghon notebook using PyODBC: https://github.com/cjmatta/drill_ipython_notebook On Tuesday, April 21, 2015, Charles Givre wrote: > All, > Forgive the n00b question, but I've been trying to write a Python > script to interact with Apache Drill. I've installed the JayDeBeAPI,

Drill and Python

2015-04-21 Thread Charles Givre
All, Forgive the n00b question, but I've been trying to write a Python script to interact with Apache Drill. I've installed the JayDeBeAPI, and PyODBC modules, but I've not been able to successfully connect. (I also installed the ODBC driver). Does anyone have working code that establishes the co

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Christopher Matta
Thanks Jacques, I'll take a look. In this case would predicate queries end up doing full scans of the data? Chris Matta cma...@mapr.com 215-701-3146 On Tue, Apr 21, 2015 at 11:55 AM, Jacques Nadeau wrote: > It doesn't look like the key has any UTF8 data in it. I recommend you > create a UDF th

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Andries Engelbrecht
Maybe look at using BYTE_SUBSTR to deconstruct the row_key in the elements of openTSDB, and then convert each element as appropriate. http://drill.apache.org/docs/string-manipulation/#byte_substr —Andries On Apr 21, 2015, at 8:56 AM, Jacques Nadeau wrote: > Quick edit, my query should hav

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Jacques Nadeau
Quick edit, my query should have used "tags" instead of "tag". Corrected below: SELECT row.metric, row.tags[0].key, row.tags[0].value ( SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1 )x On Tue, Apr 21, 2015 at 8:55 AM, Jacques Nadeau wrote: > It doesn't look like the key has any UTF8 data

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Jacques Nadeau
It doesn't look like the key has any UTF8 data in it. I recommend you create a UDF that breaks the bytes apart into the separate sections using a complex output. This way you could write something like this: select row.metric, row.tag[0].key, row.tag[0].value ( SELECT OPENTSDB_ROW(rowkey) as row

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Carol McDonald
You can use the CONVERT_TO and CONVERT_FROM functions to encode and decode data that is binary or complex. For example, HBase stores data as encoded VARBINARY data. To read HBase data in Drill, convert every column of an HBase table *from* binary to an SQL data type while selecting the data. http:/

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Carol McDonald
I don't have the ostb table to test , but for byte number conversion something like convert_from(row_key , 'BIGINT_BE') https://cwiki.apache.org/confluence/display/DRILL/SQL+Functions The following table provides the data types that you use with the CONVERT_TO and CONVERT_FROM functions: *Type**

Querying OpenTDSB data stored in HBase

2015-04-21 Thread Christopher Matta
I’m trying to use Drill to query time-series data stored in OpenTSDB . The row keys are supposed to be byte array encoded according to this schema: http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html When trying to do a simple CONVERT_FROM I get the following r