Re: Understanding the concept of Drill fragment

2016-03-31 Thread Steven Phillips
I am not familiar with the Spark terminology, but I think you more or less have it correct. You example query does not necessarily involve a hash-exchange (roughly the equivalent to a shuffle), because it's possible to run the entire execution in a single fragment. In this case, it would probably

Re: self hosting drill in a java application

2016-03-30 Thread Steven Phillips
Yes, it possible to run an embedded drillbit in an application. A simple example is this tool: https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/client/QuerySubmitter.java If the local option is specified, it will start up one or more drillbits. On We

Re: Question on nested JSON behavior

2016-03-10 Thread Steven Phillips
t.batter.id from (select flatten(batters.batter)) as batter from `sample.json`) t; On Thu, Mar 10, 2016 at 6:25 PM, Steven Phillips wrote: > Yeah, it's definitely a bug. Could you please file a jira? > > On Thu, Mar 10, 2016 at 6:19 PM, Jiang Wu wrote: > >> Here are the complete

Re: Question on nested JSON behavior

2016-03-10 Thread Steven Phillips
ode simply picks the > first N values from the union of all values across all rows. The N is the > number of rows in the result. > > For example, if I give this query: > > 0: jdbc:drill:zk=local> select id, t.batters.batter.id from > dfs.`c:\tmp\sample.json` t; > +---

Re: Question on nested JSON behavior

2016-03-10 Thread Steven Phillips
I am surprised that you are getting that result. I would have expected the query to fail. Since batter is an array, you should specify the index of the array if yo want to access lower level elements. A way to access all of the sub-fields of a repeated map is something we've discussed, but never i

Re: Reg: Drill-override.conf parameter

2016-03-08 Thread Steven Phillips
This parameter is under the sort namespace. It applies to the TopN sort operator. While computing the TopN, we hold onto incoming batches and maintain a list of references to the records which make up the current TopN. Periodically, we will copy the records that we want to keep, and release the bat

Re: expected behavior when using wild cards in table name

2016-02-11 Thread Steven Phillips
I don't understand why they wouldn't be allowed. They seem perfectly valid. On Thu, Feb 11, 2016 at 9:42 AM, Abdel Hakim Deneche wrote: > I have the following table tpch100/lineitem that contains 97 parquet files: > > tpch100/lineitem/part-m-0.parquet > tpch100/lineitem/part-m-1.parquet

Re: Questions about configuration and deployment

2015-12-10 Thread Steven Phillips
Hi Burton, Like John said, I would strongly recommend using the same configuration for each drillbit. The reason memory settings are in drill-env is that we use the standard java options to limit the amount of memory the jvm can use. drill-override.conf contains Drill specific settings. Let me k

Re: Having difficulties using CASE statement to manage heterogeneous schemas

2015-12-02 Thread Steven Phillips
You can file a jira at https://issues.apache.org/jira/browse/DRILL/ When you file, go ahead and assign it to me. On Wed, Dec 2, 2015 at 1:41 PM, Jacques Nadeau wrote: > Steven, can you look at this? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Wed, Dec 2, 2015 at 10:05 AM, John S

Re: Text Vectorisation

2015-11-18 Thread Steven Phillips
Could you elaborate a bit on what it is you are trying to do, as well as what you have tried and what result you saw? Thanks. On Tue, Nov 17, 2015 at 8:43 AM, Boris Chmiel < boris.chm...@yahoo.com.invalid> wrote: > hello users, > I'm trying to vectorize a text field of a CSV file and planning to

Re: Help with Troubleshooting dense error message

2015-11-04 Thread Steven Phillips
This looks like DRILL-4006, a fix for which just went in. https://issues.apache.org/jira/browse/DRILL-4006 On Wed, Nov 4, 2015 at 12:16 PM, John Omernik wrote: > I am on MapR's 1.2.1 Package. > > > > > On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche > > wrote: > > > One last thing, what v

Re: Overriding delimiter at runtime

2015-11-01 Thread Steven Phillips
The "table with options" feature is currently in the works, and it will address this use case. Currently, the only way to do this is to create two different workspaces, each with a different default format. On Sun, Nov 1, 2015 at 1:45 PM, William Witt wrote: > Is there a way to override the del

Re: Apache Drill to query OpenTSDB

2015-10-20 Thread Steven Phillips
1. You might be able to run a query against OpenTSDB, but I'm not sure if you will really be able to easily do anything useful right now. Every column qualifier in an HBase table results in a column in Drill. In the OpenTSDB format, the column qualifiers are simply time offsets from the base timest

Re: Stop Drill querying .tmp files

2015-10-13 Thread Steven Phillips
DRILL-2424 has a comment from Mehant that this should be fixed, but that there was some sort of merge conflict. Was this ever resolved? Or a new jira filed? On Tue, Oct 13, 2015 at 10:13 AM, Rajkumar Singh wrote: > There is related jira already filed > https://issues.apache.org/jira/browse/DRILL

Re: Convert from Array to String

2015-10-12 Thread Steven Phillips
convert_to is the correct function in this case. convert_to converts the Drill type into some encoding. The output of the convert_to function is VarBinary. Can you try wrapping cast( ... as varchar(255)) and see if that displays it correctly? On Mon, Oct 12, 2015 at 1:37 PM, John Omernik wrote:

Re: Parquet #Files and closing accountor error

2015-10-08 Thread Steven Phillips
In answer to the other part of your question, yes, by default, each fragment will write into its own set of files, you could be looking at (# unique values) * (number of fragments) files being created. There is an option to shuffle the data before writing, so that each value will be written by only

Re: repeated_contains - intended behaviour?

2015-10-05 Thread Steven Phillips
same name. On Sun, Oct 4, 2015 at 3:34 PM, Stefán Baxter wrote: > Hi, > > For me the wild card functionality is fine and functions as expected. > It's partly because of it that I expected an exact match when no operator > was in play. > > Regards, > -Stefan > > O

Re: repeated_contains - intended behaviour?

2015-10-04 Thread Steven Phillips
Repated_contains originally worked as Jason describes, exact matching. At some point, someone thought that it should allow wildcards and do substring matching. There was never any real discussion on what this function should do, though. It would probably be a good idea for someone to come up with a

Re: help with ApacheDrill and S3

2015-09-22 Thread Steven Phillips
You need to change s3 to s3n in the URI: See the discussion in the comments of this blog post: http://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/ Hopefully that helps. Let me know if you are still having problems. On Tue, Sep 22, 2015 at 8:47 AM, Andries Engelbrecht < aen

Re: BlockMissingException

2015-09-09 Thread Steven Phillips
The errors like: "java.io.FileNotFoundException: Path is not a file: /warehouse2/completed/events/connection_events/1441290600" are really just noise, and aren't related to the failure. We should probably clean them up so that we aren't attempting to open directories, but they are not causing the q

Re: Standard of Parquet nested data types

2015-09-01 Thread Steven Phillips
uot;:2},{"array":3}]} | > +----+ > 1 row selected (0.335 seconds) > > > Is there any trick of reading the list properly in Drill? > > Thanks, > Hao > > > > > > On Fri, Aug 28, 2015 at 4:20 PM, Steven Phillips &

Re: Standard of Parquet nested data types

2015-08-28 Thread Steven Phillips
Both parquet and drill internal data model is based on protobuf, meaning there are required, optional, and repeated fields. In this model, repeated fields cannot be null, nor can they have null elements. The 3-layer nested structure is necessary to represent a field where the array itself is nullab

Re: Type confusion and number formatting exceptions

2015-07-28 Thread Steven Phillips
+-+ > > |type| EXPR$1 | > > ++-+ > > | plan.item.removed | 947 | > > | plan.item.added| 40342 | > > ++-+ > > 2 rows selected (0.508 seconds) > > > > > > *2. Same query but involves dimension.type as well* > > > > select p.type, coalesce(p.dimensions.dim_type, p.dimensions.type) > > dimensions_type, count(*) from > > dfs.tmp.`/analytics/processed//events` as p where > occurred_at > > > '2015-07-26' and p.type in ('plan.item.added','plan.item.removed') > group > > by p.type, coalesce(p.dimensions.dim_type, p.dimensions.type); > > > > Error: SYSTEM ERROR: NumberFormatException: To See > > Fragment 2:0 > > [Error Id: 4756f549-cc47-43e5-899e-10a11efb60ea on localhost:31010] > > (state=,code=0) > > > > > > I can provide test data if this is not enough to reproduce this bug. > > > > Regards, > > -Stefán > > > -- Steven Phillips Software Engineer mapr.com

Re: Hash Agg vs Streaming Agg for a smaller data set

2015-07-10 Thread Steven Phillips
; 00-07 Filter(condition=[AND(=($0, 1992-01-01), =($1, > 1992-01-01))]) > 00-08Project(l_moddate=[$2], l_shipdate=[$1], > l_modline=[$0]) > 00-09 Scan. > > - Rahul > -- Steven Phillips Software Engineer mapr.com

Re: DRILL on HBSE: SELECT COUNT(*) issue

2015-07-10 Thread Steven Phillips
this e-mail is > strictly prohibited. If you have received this e-mail in error, please > immediately inform us by returning e-mail, and thereafter, proceed to > permanently delete the entire e-mail sent in error. Thank you. > > > -- Steven Phillips Software Engineer mapr.com

Re: Drill 1.1 and partition by

2015-07-07 Thread Steven Phillips
post which describes the option well enough > > http://mail-archives.apache.org/mod_mbox/drill-commits/201506.mbox/%3c38571170b14d484bba843f1a513be...@git.apache.org%3E > -- Steven Phillips Software Engineer mapr.com

Re: Dependency on zookeeper

2015-06-23 Thread Steven Phillips
: > Hi All, > > Wanted to ask if there is dependency on number of zookeeper instances. > > For e.g. if I am running 25 nodes for drill bits - will 1 zookeeper > suffice. > > Also, any performance tips for reading data from S3. > > Regards, > Sarvesh > > -- &g

Re: timestamp string to epoch time

2015-06-16 Thread Steven Phillips
tamp(1432912733)) > > > from `sys`.`version`; > > > > > > Error: SYSTEM ERROR: java.lang.IllegalArgumentException: Invalid > format: > > > "2015-05-29 15:18:53.000" is malformed at ".000” > > > > > > —Andries > > > > > > > > > > > > On Jun 15, 2015, at 7:18 AM, Christopher Matta > wrote: > > > > > > > Is there a way to convert a timestamp string to unix time? > > > > > > > > Chris Matta > > > > cma...@mapr.com > > > > 215-701-3146 > > > > > > > > > -- Steven Phillips Software Engineer mapr.com

Re: Drill vs MapR-DB/HBASE : fragments per region for table-scan

2015-06-09 Thread Steven Phillips
emand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > -- Steven Phillips Software Engineer mapr.com

Re: Custom UDFS slow

2015-05-28 Thread Steven Phillips
out is also quite a likely > possibility. > > Seeing your code (put it in a gist, don't attach it) would help a lot. > Seeing queries and query plans would help as well. > -- Steven Phillips Software Engineer mapr.com

Re: How to deploy Drill to achieve optimal performance

2015-05-05 Thread Steven Phillips
like to ask for > help > > how to achieve that? How to make Drill run queries in low-latency (in > > seconds not minutes)? > > > > Any suggestions are welcome! > > > > Thanks! > > > > George > > > -- Steven Phillips Software Engineer mapr.com

Re: Do I need to install Drill on each of the Hadoop data nodes or HBase RegionServer

2015-05-03 Thread Steven Phillips
it drill-env.sh, located in > /opt/drill/conf/, and define HADOOP_HOME:" > > What is external JAR files? What is the purpose if I set the HADOOP_HOME? > > Thanks! > > George Lu > -- Steven Phillips Software Engineer mapr.com

Re: drill gives first result for repeated values

2015-04-30 Thread Steven Phillips
;: "xyz" > > } > > } > > ] > > } > > When i query > > select t.company.`modelName` from hdfs.`autom.json` t ; > > it gives result > > {"name":"abc"} > > However, The expected result was both entries. > > {"name":"abc"} > > {"name":"xyz"} > > Even when I query > > select t.company.`modelName` from hdfs.`autom.json` t where > > t.company.`modelName`.`name`='xyz' ; > > it does not find anything. > > > > > > > -- Steven Phillips Software Engineer mapr.com

Re: Desired Behavior when a table has both files and folders?

2015-04-29 Thread Steven Phillips
> > +++ > > |dir0|col2| > > +++ > > | folder1 | 2 | > > | null | null | > > +++ > > > > Looks like drill ignored the columns from the first file. > > > > - Rahul > > > -- Steven Phillips Software Engineer mapr.com

Re: Two-stage aggregation functions

2015-04-16 Thread Steven Phillips
face > similar to, for example, that implemented in Impala? The scenario I'm > evaluating involves approximate unique value counting with hyperloglog, > which would benefit from the ability to perform the counting locally by > each drillbit folowed by a hyperloglog state merge f

Re: Is there any way to create a parquet file with multiple parquet blocks using Drill?

2015-04-08 Thread Steven Phillips
`sometable`); > > All resulting files are with size 10M(Same as parquet block size). > > My question is: > Is there any way to create a parquet file with multiple parquet blocks? > > Thanks, > Hao > -- Steven Phillips Software Engineer mapr.com

Re: Drill to query Client-side encrypted data from S3

2015-04-07 Thread Steven Phillips
keys to decrypt the data are custom > >> controlled. Is there a way I can use drill with this data given that I > have > >> a java module that can be called that will provide the master key to > >> decrypt the data on the fly? > >> > > My situation: A lot of the use cases that we have might work well > >> with the new approach of S3 client-side encryption, but for using drill > to > >> explore that data. So any pointers/help here will be much appreciated. > >> > > Thanks! > >> > > -Ganesh > >> > > >> > >> > > > > > -- Steven Phillips Software Engineer mapr.com

Re: Parquet File Weirdness

2015-04-03 Thread Steven Phillips
When I query it in hive, it works fine, when I run a > > count(*) on it drill it works (fast) but when I run a query, it seems > > to return the same number of results, but it look likes this... > > thoughts? (These should be strings with emails, domains, etc) > > > >

Re: more on parsing timestamps

2015-04-02 Thread Steven Phillips
0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1; > >>>>> ++ > >>>>> | t | > >>>>> ++ > >>>>> | 2015-01-27T13:43:53.000Z | > >>>>> ++ > >>>>> 1 row selected (0.119 seconds) > >>>>> > >>>>> The below queries, identical apart from the limit clause, behave > >>>>> differently. The one with the limit clause works, the one without > >>>> doesn't. > >>>>> The limit is larger than the total number of rows, so in both cases > we > >>>>> should be processing all rows. > >>>>> > >>>>> No limit clause. It fails: > >>>>> > >>>>> ``` > >>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t, > >>>>> '-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) > as > >>>> t; > >>>>> Query failed: RemoteRpcException: Failure while trying to start > remote > >>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T' > >>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on > >>>>> 192.168.99.1:31010 ] > >>>>> ``` > >>>>> > >>>>> Limit clause in the subselect (larger than the number of rows in the > >>>> table) > >>>>> succeeds. > >>>>> > >>>>> ``` > >>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t, > >>>>> '-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test > limit > >>>>> 1) as t; > >>>>> ... > >>>>> | 2015-02-17 07:18:00.0 | > >>>>> ++ > >>>>> 13,015,350 rows selected (105.257 seconds) > >>>>> ``` > >>>>> > >>>>> Data can be downloaded here: > >>>>> > >>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz > >>>> > >>>> > >> > > > > -- Steven Phillips Software Engineer mapr.com

Re: Question on SQL over CSV

2015-03-30 Thread Steven Phillips
as it reads the CSV in > such cases? I am trying to use Drill for data exploration purposes and > mostly to get a peek into the data set from my data lake before running > bigger queries/analytics on this data set. > > Regards, > Ganesh > -- Steven Phillips Software Engineer mapr.com

Re: Drill favouring a particular Drillbit

2015-03-25 Thread Steven Phillips
nicely. The minute > we start the drillbit on that node again, it starts swamping it with work. > > I'll shoot through the JSON profiles and some more information on the > dataset etc. later today (Australian time!). > > On Thu, Mar 26, 2015 at 5:31 AM, Steven Phillips >

Re: Drill favouring a particular Drillbit

2015-03-25 Thread Steven Phillips
;s point, the node that the client connects to is not currently > randomized. Given your description of behavior, I'm not sure that you're > hitting 2512 or just general undesirable distribution. > > On Wed, Mar 25, 2015 at 10:18 AM, Steven Phillips > wrote: > > >

Re: Drill favouring a particular Drillbit

2015-03-25 Thread Steven Phillips
t this essentially swamping this data node with 100% CPU > usage > > while leaving the others barely doing any work. > > > > As soon as we shut down the Drillbit on this data node, query performance > > increases significantly. > > > > Any thoughts on how I can troubleshoot why Drill is picking that > particular > > node? > > > -- Steven Phillips Software Engineer mapr.com

Re: Drill favouring a particular Drillbit

2015-03-25 Thread Steven Phillips
know that Drill tries to get data locality, so I'm wondering if this is > > the cause, but this essentially swamping this data node with 100% CPU > usage > > while leaving the others barely doing any work. > > > > As soon as we shut down the Drillbit on this data node, query performance > > increases significantly. > > > > Any thoughts on how I can troubleshoot why Drill is picking that > particular > > node? > > -- Steven Phillips Software Engineer mapr.com

Re: Drill file encondig

2015-03-12 Thread Steven Phillips
_replace on the text field I get an error for some files. > (Encountered an illegal char on line 1, column 38: ‘’) > > Thanks > —Andries -- Steven Phillips Software Engineer mapr.com

Re: Using CTAS with nested structures

2015-02-23 Thread Steven Phillips
s decimal(5,2)) , cast(d.map.col2 as double)) from > `data.json`; > > > I am looking for something on the below lines : > > select cast(m1 as map(col1:decimal, col2:double)) from `data.json`; > > - Rahul > -- Steven Phillips Software Engineer mapr.com

Re: Storage Plugin Config for XML

2015-02-23 Thread Steven Phillips
dissemination, or reproduction of this > message is strictly prohibited and may be unlawful. If you are not the > intended recipient, please contact the sender by return e-mail and destroy > all copies of the original message. > > -- Steven Phillips Software Engineer mapr.com

Re: best way to query hbase dynamic columns

2015-02-20 Thread Steven Phillips
nM=","4007774":"FUM="} | > > > +++ > > > 5 rows selected (0.766 seconds) > > > 0: jdbc:drill:> select convert_from(row_key, 'UTF8') as tid, > > kvgen(t.price) > > > as price from dfs.`/tables/trades_flat` t limit 5; > > > +++ > > > |tid | price| > > > +++ > > > | AMZN_2013102107 | [{"key":"3901713","value":"JUA="}] | > > > | AMZN_2013102108 | [{"key":"4600159","value":"E6o="}] | > > > | AMZN_2013102109 | > > > > > > > > > [{"key":"3136026","value":"HL4="},{"key":"3448092","value":"JjI="},{"key":"3926121","value":"Hq0="}] > > > | > > > | AMZN_2013102111 | > > > > > > > > > [{"key":"1149689","value":"Iuo="},{"key":"3023456","value":"HRs="}] > > > | > > > | AMZN_2013102112 | > > > > > > > > > [{"key":"0705787","value":"InM="},{"key":"4007774","value":"FUM="}] > > > | > > > +++ > > > > > > -- Steven Phillips Software Engineer mapr.com

Re: Dates from CSV files in WHERE clauses?

2015-02-12 Thread Steven Phillips
What exactly was the result. I would expect it would implicitly cast the string to a date for the comparison. On Thu, Feb 12, 2015 at 2:38 PM, Minnow Noir wrote: > Yes. > On Feb 12, 2015 5:36 PM, "Steven Phillips" wrote: > > > did you try the form: > > wher

Re: Dates from CSV files in WHERE clauses?

2015-02-12 Thread Steven Phillips
te(, ) seems to work, although it's verbose. > > > > Do I have this right, and is there a less verbose way to handle this? > > > > Thanks > > > -- Steven Phillips Software Engineer mapr.com

Re: Drill - Flatten function - help please

2015-02-03 Thread Steven Phillips
t; >>> 6 rows selected (0.163 seconds) > >>> > >>> Listing:3 > >>> > >>> 0: jdbc:drill:> SELECT t.a,t.b,t.c.x, t.c.y, flatten(t.c.z) from > dfs.`/data/nested/clicks/sthota_test_1.json` as t; > >>> ++++++ > >>> | a | b | EXPR$2 | EXPR$3 | EXPR$4 | > >>> ++++++ > >>> | r1cl1 | r1c2 | 1 | a string | 1 | > >>> | r2cl1 | r2c2 | 2 | a string | 1 | > >>> | r2cl1 | r2c2 | 2 | a string | 2 | > >>> | r3cl1 | r3c2 | 3 | a string | 1 | > >>> | r3cl1 | r3c2 | 3 | a string | 2 | > >>> | r3cl1 | r3c2 | 3 | a string | 3 | > >>> | r4cl1 | r4c2 | 4 | a string | 1 | > >>> | r4cl1 | r4c2 | 4 | a string | 2 | > >>> | r4cl1 | r4c2 | 4 | a string | 3 | > >>> | r4cl1 | r4c2 | 4 | a string | 4 | > >>> | r5cl1 | r5c2 | 5 | a string | 1 | > >>> | r5cl1 | r5c2 | 5 | a string | 2 | > >>> | r5cl1 | r5c2 | 5 | a string | 3 | > >>> | r5cl1 | r5c2 | 5 | a string | 4 | > >>> | r5cl1 | r5c2 | 5 | a string | 5 | > >>> ++++++ > >>> 15 rows selected (0.171 seconds) > >>> > >>> Thanks > >>> Sudhakar Thota > > > -- Steven Phillips Software Engineer mapr.com

Re: JSON file size vs number of files

2015-02-02 Thread Steven Phillips
d be the maximum advisable size > for a single JSON file? As at some point there will be tradeoff with > reduced # of files vs maximum size of a single file. > > > > Something to consider when using Flume or another tool as data source > for eventual Drill consumption. > > > > —Andries > > > > > > -- Steven Phillips Software Engineer mapr.com

Re: question for pre-filter parquet file data

2015-01-27 Thread Steven Phillips
nBytes method, > if avgRowSizeInBytes is to large, the return value will be out of int > range. So the code should be fixed like "return > ((long)avgRowSizeInBytes)*1024L*1024L". > Thanks&Regards -- Steven Phillips Software Engineer mapr.com

Re: Filter out empty arrays in JSON

2015-01-21 Thread Steven Phillips
nswer-23195243 > > > > If others agree, I think it would be appropriate for Hao to file a JIRA > to > > make sure we follow and check this convention. > > > > J > > > > On Wed, Jan 21, 2015 at 5:07 PM, Steven Phillips > > > wrote: >

Re: Filter out empty arrays in JSON

2015-01-21 Thread Steven Phillips
t;>>>>> > > >>>>>>> > > >>>>>>> Error: exception while executing query: Failure while executing > > >> query. > > >>>>>>> (state=,code=0) > > >>>>>>> > > >>>>>>> > > >>>>>>> { > > >>>>>>> "entities": { > > >>>>>>> "trends": [], > > >>>>>>> "symbols": [], > > >>>>>>> "urls": [], > > >>>>>>> "hashtags": [], > > >>>>>>> "user_mentions": [] > > >>>>>>> }, > > >>>>>>> "entities": { > > >>>>>>> "trends": [1,2,3], > > >>>>>>> "symbols": [4,5,6], > > >>>>>>> "urls": [7,8,9], > > >>>>>>> "hashtags": [ > > >>>>>>> { > > >>>>>>> "text": "GoPatriots", > > >>>>>>> "indices": [] > > >>>>>>> } > > >>>>>>> ], > > >>>>>>> "user_mentions": [] > > >>>>>>> } > > >>>>>>> } > > >>>>>>> > > >>>>>>> The issue seems to be that if some records have arrays with maps > in > > >> them > > >>>>>>> and others are empty. > > >>>>>>> > > >>>>>>> —Andries > > >>>>>>> > > >>>>>>> > > >>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu wrote: > > >>>>>>> > > >>>>>>>> Seems it works for below json file: > > >>>>>>>> { > > >>>>>>>> "entities": { > > >>>>>>>> "trends": [], > > >>>>>>>> "symbols": [], > > >>>>>>>> "urls": [], > > >>>>>>>> "hashtags": [ > > >>>>>>>> { > > >>>>>>>> "text": "GoPatriots", > > >>>>>>>> "indices": [ > > >>>>>>>> 83, > > >>>>>>>> 94 > > >>>>>>>> ] > > >>>>>>>> } > > >>>>>>>> ], > > >>>>>>>> "user_mentions": [] > > >>>>>>>> }, > > >>>>>>>> "entities": { > > >>>>>>>> "trends": [1,2,3], > > >>>>>>>> "symbols": [4,5,6], > > >>>>>>>> "urls": [7,8,9], > > >>>>>>>> "hashtags": [ > > >>>>>>>> { > > >>>>>>>> "text": "GoPatriots", > > >>>>>>>> "indices": [] > > >>>>>>>> } > > >>>>>>>> ], > > >>>>>>>> "user_mentions": [] > > >>>>>>>> } > > >>>>>>>> } > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as > t > > >> where > > >>>>>>>> t.entities.urls is not null; > > >>>>>>>> ++ > > >>>>>>>> | EXPR$0 | > > >>>>>>>> ++ > > >>>>>>>> | [7,8,9]| > > >>>>>>>> ++ > > >>>>>>>> 1 row selected (0.139 seconds) > > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as > t > > >> where > > >>>>>>>> t.entities.urls is null; > > >>>>>>>> ++ > > >>>>>>>> | EXPR$0 | > > >>>>>>>> ++ > > >>>>>>>> ++ > > >>>>>>>> No rows selected (0.158 seconds) > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> Hao > > >>>>>>>> > > >>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya < > adityakish...@gmail.com> > > >>>>> wrote: > > >>>>>>>> > > >>>>>>>>> I believe that this works if the array contains homogeneous > > >> primitive > > >>>>>>>>> types. In your example, it appears from the error, the array > > field > > >>>>>>> 'member' > > >>>>>>>>> contained maps for at least one record. > > >>>>>>>>> > > >>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta < > > >> cma...@mapr.com> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Trying that locally did not work for me (drill 0.7.0): > > >>>>>>>>>> > > >>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from > > >>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; > > >>>>>>>>>> Query failed: Query stopped., Failure while trying to > > materialize > > >>>>>>>>> incoming schema. Errors: > > >>>>>>>>>> > > >>>>>>>>>> Error in expression at index -1. Error: Missing function > > >>>>>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full > expression: > > >>>>>>> --UNKNOWN > > >>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on > > >>>>>>> bullseye-3:31010 ] > > >>>>>>>>>> > > >>>>>>>>>> Error: exception while executing query: Failure while > executing > > >>>>> query. > > >>>>>>>>> (state=,code=0) > > >>>>>>>>>> > > >>>>>>>>>> ​ > > >>>>>>>>>> > > >>>>>>>>>> Chris Matta > > >>>>>>>>>> cma...@mapr.com > > >>>>>>>>>> 215-701-3146 > > >>>>>>>>>> > > >>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya < > > adityakish...@gmail.com > > >>> > > >>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> repeated_count('entities.urls') > 0 > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < > > >>>>>>>>>>> aengelbre...@maprtech.com> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> How do you filter out records with an empty array in drill? > > >>>>>>>>>>>> i.e some records have "url":[] and some will have an array > > >> with > > >>>>> data > > >>>>>>>>> in > > >>>>>>>>>>>> it. When trying to read records with data in the array drill > > >> fails > > >>>>>>> due > > >>>>>>>>>>> to > > >>>>>>>>>>>> records missing any data in the array. Trying a filter > with/* > > >> where > > >>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying url > is > > >> not > > >>>>>>>>> null. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Note some of the arrays contains maps, using twitter data as > > an > > >>>>>>>>> example > > >>>>>>>>>>>> below. Some records have an empty array with “hashtags”:[] > > and > > >>>>>>> others > > >>>>>>>>>>> will > > >>>>>>>>>>>> look similar to what is listed below. > > >>>>>>>>>>>> > > >>>>>>>>>>>> "entities": { > > >>>>>>>>>>>> "trends": [], > > >>>>>>>>>>>> "symbols": [], > > >>>>>>>>>>>> "urls": [], > > >>>>>>>>>>>> "hashtags": [ > > >>>>>>>>>>>> { > > >>>>>>>>>>>> "text": "GoPatriots", > > >>>>>>>>>>>> "indices": [ > > >>>>>>>>>>>> 83, > > >>>>>>>>>>>> 94 > > >>>>>>>>>>>> ] > > >>>>>>>>>>>> } > > >>>>>>>>>>>> ], > > >>>>>>>>>>>> "user_mentions": [] > > >>>>>>>>>>>> }, > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks > > >>>>>>>>>>>> —Andries > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>> > > >>>>> > > >>> > > >>> > > >>> > > >> > > >> > > > > > -- Steven Phillips Software Engineer mapr.com

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread Steven Phillips
e drill-bit > nodes > > > > participated in the query execution? > > > > > > > > > > > > --- > > > > Mufeed Usman > > > > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | > My > > > > So

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread Steven Phillips
node MapR cluster. > > > --- > Mufeed Usman > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My > Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal > <http://mufeed.livejournal.com> > -- Steven Phillips Software Engineer mapr.com

Re: create table as default to parquet?

2015-01-07 Thread Steven Phillips
;>> > >>> to change the storage format. > >>> > >>> > >>> > >>> On Tue, Jan 6, 2015 at 12:35 PM, Sungwook Yoon > >> wrote: > >>> > >>>> Hi > >>>> > >>>> I am trying to save the query as csv > >>>> > >>>> So, I am doing > >>>> > >>>> create table as dfs.tmp.`/tmp.csv` select .. > >>>> > >>>> It creates a parquet file. > >>>> Why did it not create csv file? > >>>> > >>>> Thanks, > >>>> > >>>> Sungwook > >>>> > >>> > >> > > -- Steven Phillips Software Engineer mapr.com

Re: Query gz compressed JSON files with Apache Drill

2014-12-17 Thread Steven Phillips
t; > Thanks! > Paul > > This email and any attachments may contain confidential and proprietary > information of Blackboard that is for the sole use of the intended > recipient. If you are not the intended recipient, disclosure, copying, > re-distribution or other use of any of this information is strictly > prohibited. Please immediately notify the sender and delete this > transmission if you received this email in error. > -- Steven Phillips Software Engineer mapr.com

Re: maprdb setting up foreman error

2014-12-15 Thread Steven Phillips
q-core-0.9-drill-r8.jar:na] > >>> > > > at > >>> > > > > >>> > > > >>> > > >>> > org.eigenbase.relopt.volcano.RelSubset.propagateCostImprovements(RelSubset.java:314) > >>> > > > ~[optiq-

Re: Drill plugin for zipped files

2014-12-12 Thread Steven Phillips
zip command before get > >> data? > >> Or can you add plugin for zipped files? > >> --- > >> Best regards, > >> Dima Pl > > > > > > -- > > *Jim Scott* > > Director, Enterprise Strategy & Architecture > > +1 (347) 746-9281 > > > > <http://www.mapr.com/> > > [image: MapR Technologies] <http://www.mapr.com> > -- Steven Phillips Software Engineer mapr.com

Re: maprdb setting up foreman error

2014-12-12 Thread Steven Phillips
le executing query: Failure while executing query. >> (state=,code=0) >> >> >> What's going on? >> Drillbit reads the column family information correctly, just does not go >> to >> columns. >> Thanks, >> >> Sungwook >> >> > -- Steven Phillips Software Engineer mapr.com

Re: maprdb hbase drill does not get any result

2014-12-11 Thread Steven Phillips
t 12:50 PM, Aditya > wrote: > > > On Thu, Dec 11, 2014 at 12:47 PM, Sungwook Yoon > > > wrote: > > > > > The sqlline is executed under root too. > > > > > > ​What about DrillBits?​ Or are you running Drill in embedded mode? > > > -- Steven Phillips Software Engineer mapr.com

Re: maprdb hbase drill does not get any result

2014-12-11 Thread Steven Phillips
gt; > Sungwook > > > On Thu, Dec 11, 2014 at 12:43 PM, Steven Phillips > > wrote: > > > If it's returning 0 records, but no exception is thrown, my first guess > > would be that there is a permission issue. > > > > On Thursday, December 11, 2014

Re: maprdb hbase drill does not get any result

2014-12-11 Thread Steven Phillips
urns correct column family names > > > > But, select * from excel; > > does not work. > > > > What should I look at? > > > > Thanks, > > > > Sungwook > > > -- Steven Phillips Software Engineer mapr.com

Re: Is there a simple way to let drill rediscovery views?

2014-12-09 Thread Steven Phillips
r the directory '/tmp'. But drill seems > > cannot discovery it automatically. We must recreate the view based on the > > contents of view file and register the views. > > > > Is there some simpler way to force drill to register all of them? > > > > Thanks. > > > -- Steven Phillips Software Engineer mapr.com

Re: JDBC with SQuirreL, anybody made it work?

2014-12-03 Thread Steven Phillips
gt;> >> net.sourceforge.squirrel_sql.client.Version.(Version.java:34) > >> >> >>at net.sourceforge.squirrel_sql.client.Main.main(Main.java:60) > >> >> >> > >> >> >> If I replace the log4j.jar with the one from drill, I get > >> >> >> Exception in thread "main" java.lang.IncompatibleClassChangeError: > >> >> >> Implementing class > >> >> >>at java.lang.ClassLoader.defineClass1(Native Method) > >> >> >>at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > >> >> >>at > >> >> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > >> >> >>at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > >> >> >>at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > >> >> >>at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > >> >> >>at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > >> >> >>at java.security.AccessController.doPrivileged(Native Method) > >> >> >>at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > >> >> >>at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > >> >> >>at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > >> >> >>at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > >> >> >>at > >> >> > >> > net.sourceforge.squirrel_sql.client.SquirrelLoggerFactory.(SquirrelLoggerFactory.java:47) > >> >> >>at > net.sourceforge.squirrel_sql.client.Main.startApp(Main.java:80) > >> >> >>at net.sourceforge.squirrel_sql.client.Main.main(Main.java:73) > >> >> >> > >> >> >> > >> >> >> Any suggestions? > >> >> >> > >> >> >> Thanks in advance. > >> >> > > >> >> > >> > -- Steven Phillips Software Engineer mapr.com

Re: Best practices for DRILL_CLASSPATH setting

2014-12-01 Thread Steven Phillips
? The only reason I can think of for doing this is if you are adding UDFs, or have implemented your own storage plugin. On Mon, Dec 1, 2014 at 1:21 PM, Steven Phillips wrote: > Yes, drill-env.sh would be the place to put this, regardless of how Drill > is deployed. > > On Mon, Dec 1,

Re: Best practices for DRILL_CLASSPATH setting

2014-12-01 Thread Steven Phillips
it is properly available to the service ? > > — David > > -- Steven Phillips Software Engineer mapr.com