Re: Avro - Let's talk Avro again

2017-08-18 Thread Stefán Baxter
; I was guessing you would chime in with a response ;) > > > > Are you still using Drill w/ Avro how has things been lately? > > > > On Thu, Aug 17, 2017 at 8:00 AM, Stefán Baxter < > ste...@activitystream.com> > > wrote: > > > >&g

Re: Drill schema handling [Was: Avro - Let's talk Avro again]

2017-08-19 Thread Stefán Baxter
trying to prioritize > > given what we have. But we do not have to feel constrained. We can get > more > > developers to participate in this and help out. And I am very positive > > about that approach-I know that I helped a user here to get help on using > > Apache Drill insid

Re: Avro - Let's talk Avro again

2017-08-19 Thread Stefán Baxter
may not sound nice to the ears but is exactly the kind of feedback > that will make this project truly successful. > > Best, > Saurabh > > > > I > > On Fri, Aug 18, 2017 at 1:42 PM, Stefán Baxter > wrote: > > > Hi John, > > > > Love Drill but

Various ramblings of a newbie

2015-07-11 Thread Stefán Baxter
Hi, I'm new to Drill and Parquet and the following are questions/observations I made during my initial discovery phase. I'm sharing them here for other newbies but also to see if some of these concerns are invalid or based on misunderstanding. I made no list of the things that I like of what I h

Re: Various ramblings of a newbie

2015-07-11 Thread Stefán Baxter
Hi Jacques, and thank you for answering swiftly and clearly :). Some additional questions did arise (see inline): >- *Foreign key lookups (joins)* > I'm guessing my fk_lookup scenario would/could benefit from using other storage options for that. Currently most of this is in Postgres and a

querying not that complex data

2015-07-12 Thread Stefán Baxter
Hi, I'm continuing my Drill discovery and trying to build queries against some of the data we have. Our analytics data is a flat structure and I have had no issues working with that (apart from parsing dates and dates being stored a binary (more on this later)) Now I'm just trying to discover if

Re: querying not that complex data

2015-07-12 Thread Stefán Baxter
s more examples than the > doc page you might be using. > > Kristine Hahn > Sr. Technical Writer > 415-497-8107 @krishahn > > > On Sun, Jul 12, 2015 at 1:41 PM, Stefán Baxter > wrote: > > > Hi, > > > > I'm continuing my Drill discovery and trying to

SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
Hi, I have a json file that contains a SQL timestamp. When I use it to create a Parquet file it seems to become a INT64: Jul 12, 2015 3:34:59 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 153,728B for [occurred_at] INT64: 28,910 values, 231,288B raw, 153,681B comp, 1 pages, encoding

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
ormat/#sql-types-to-parquet-logical-types > says > that the timestamp type is mapped to the Parquet TIMESTAMP_MILLI, which is > a Unix timestamp (int64). Take a look at > https://drill.apache.org/docs/data-type-conversion/#to_timestamp and the > Timezone Limitations section. > >

Re: Various ramblings of a newbie

2015-07-13 Thread Stefán Baxter
Hi and thanks, Regarding "/part2": I think that append table would allow for a "cleaner" setup. Adding data once a day would lead to a fairly messy directory structure (perhaps irrelevant). We are dealing with multi tenancy and Partition by sounds like a good way for that. I'm guessing Partition

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
t; and loigcal annotation. A timestamp should be stored as a physical INT64 > with the TIMESTAMP_MILLI annotation. See here: > > > https://github.com/apache/parquet-format/blob/master/src/thrift/parquet.thrift#L105 > > On Mon, Jul 13, 2015 at 7:47 AM, Stefán Baxter > wrote:

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
l solve this. I can't > > remember which one off hand. > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/test/resources/vector/complex/extended.json#L30 > > > > On Mon, Jul 13, 2015 at 9:54 AM, Stefán Baxter < > ste...@activitystream.c

Optimizing S3 access for Drill using Parquet files

2015-07-14 Thread Stefán Baxter
Hi, I'm wondering if the people that use Drill with S3 are using some sort of local cache on the drillbit-nodes for historical, non changing, Parquet segments. I'm pretty sure that I'm not using the correct terminology and that the correct question is this: Are there any ways to optimize S3 with

Re: Optimizing S3 access for Drill using Parquet files

2015-07-14 Thread Stefán Baxter
edded mode. If you can get a caching system to expose NFS, you could > mount this to the same path on all of your nodes and it should be able to > read from that path mounted on your local FS. > > > > On Tue, Jul 14, 2015 at 1:06 AM, Stefán Baxter > wrote: > > > Hi, &g

Re: Optimizing S3 access for Drill using Parquet files

2015-07-14 Thread Stefán Baxter
) Regards, -Stefan On Tue, Jul 14, 2015 at 7:13 PM, Paul Mogren wrote: > Stefan, > > You might be interested in http://tachyon-project.org > > > > > On 7/14/15, 1:12 PM, "Stefán Baxter" wrote: > > >Hi, > > > >Thank you. > > > >

Digging deeper

2015-07-15 Thread Stefán Baxter
Hi, We are slowly gaining some Drill/Parquet familiarity as we research it as a potential replacement/addition for/to Druid (which we also like a lot). We have, as stated earlier, come across many things that we like regarding Drill/Parquet and the "speed to value" is a killer aspect when dealing

Re: Digging deeper

2015-07-15 Thread Stefán Baxter
Hi again, I was overlooking the handy UNION operator when I was noting the combination part in my previous email. (Feel free to ignore it) Regards, -Stefan On Wed, Jul 15, 2015 at 3:56 PM, Stefán Baxter wrote: > Hi, > > We are slowly gaining some Drill/Parquet familiarity as we re

Rrounding timestamps to nearest period interval

2015-07-15 Thread Stefán Baxter
Hi, I don't seem to find a handy way to round timestamps to nearest period interval (PT5M / PT15M) and DATE_DIFF seems to be missing for simple calculation of it. It seems like a too common use case for me to write a UDF for it but if it's missing then we will happily contribute a simple implemen

Re: Rrounding timestamps to nearest period interval

2015-07-15 Thread Stefán Baxter
l 15, 2015 at 7:57 PM, Mehant Baid wrote: > Hey Stefan, > > Could you clarify with an example what is the input and expected output > for the UDF you are looking for. > > Thanks > Mehant > > > On 7/15/15 11:59 AM, Stefán Baxter wrote: > >> Hi, >> >&

jdbc connection problems

2015-07-15 Thread Stefán Baxter
Hi, I'm trying to establish a JDBC connection via zookeeper running on localhost but I gen an exception when trying to connect. Setup: - Drill 1.1 (using the standard drill-override.conf (unmodified)) - zookeeper is running on localhost (default config) - drillbit is running correctly - dril

Re: jdbc connection problems

2015-07-15 Thread Stefán Baxter
btw. I tried all versions of the connection string I could find references to. (Just to make sure it was not a strange exception for a bad connection string) -Stefna On Wed, Jul 15, 2015 at 10:01 PM, Stefán Baxter wrote: > Hi, > > I'm trying to establish a JDBC connection

Re: jdbc connection problems

2015-07-15 Thread Stefán Baxter
7 and add guava-14 to the > class path. > > Rajkumar Singh > MapR Technologies > > > > On Jul 15, 2015, at 3:46 PM, Stefán Baxter > wrote: > > > > btw. I tried all versions of the connection string I could find > references > > to. (Just to make s

Re: jdbc connection problems

2015-07-15 Thread Stefán Baxter
gards, -Stefan On Wed, Jul 15, 2015 at 10:55 PM, Stefán Baxter wrote: > ok, I see. Thank you! > > On Wed, Jul 15, 2015 at 10:47 PM, Rajkumar Singh > wrote: > >> After looking at the stack I believe google guava-17 is available in the >> class path, guava 17 deprecated the

Re: jdbc connection problems

2015-07-15 Thread Stefán Baxter
ade Drill's dependencies > in the full jar so you don't have a conflict. Can you file a Jira? > On Jul 15, 2015 5:16 PM, "Stefán Baxter" > wrote: > > > Hi, > > > > This continues :(. > > > > Either I try to use the full-jdbc driver and have the

using the REST API

2015-07-16 Thread Stefán Baxter
Hi, I have a few questions regarding the rest API. - Is it possible that the rest api (query.json) should return numeric values as strings? - count(*) being an example of that - calls for conversion on the browser side - I find no obvious setting for this - Is there any other

Re: using the REST API

2015-07-16 Thread Stefán Baxter
thanks Sudheesh, it's appreciated. On Thu, Jul 16, 2015 at 4:58 PM, Sudheesh Katkam wrote: > See inline. > > > On Jul 16, 2015, at 4:36 AM, Stefán Baxter > wrote: > > > > Hi, > > > > I have a few questions regarding the rest API. > > > >

empty results

2015-07-16 Thread Stefán Baxter
Hi, What can be happening if a drillbit (local) starts returning empty results (from a Parquet query) and does not return proper results unless it's restarted? (I noticed this started happening when I began using the REST API but I have no direct link to that) Regards, -Stefán

Re: empty results

2015-07-16 Thread Stefán Baxter
both are empty sets (with headers) On Jul 16, 2015 5:31 PM, "Sudheesh Katkam" wrote: > Is it returning empty results only though REST API? Did you try sqlline? > > Do you have a simple repro? If so, can you file a ticket? > > Thank you, > Sudheesh > > >

Re: Rrounding timestamps to nearest period interval

2015-07-16 Thread Stefán Baxter
) > > DATE_SUB(NOW(), interval '1' month) > ) x > group by x.`timestamp`, x.`user` > order by x.`timestamp` asc > > I’m of course starting with BIGINT unix timestamps, so you’ll have to > convert using unix_timestamp(). > > Chris Matta cma...@ma

Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
Hi, I'm trying to deploy a UDF that I have written according to the documentation. I have also: 1. Copied the jar file to jars/3rdparty 2. Changed the config "conf/drill-override.conf" to include: drill.logical.function.package += ["org.apache.drill.exec.expr.fn.impl","com.activi

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
NO! Thank you , will do that right now :) On Sun, Jul 19, 2015 at 5:38 PM, Jim Bates wrote: > Did you include a file drill-module.conf in your jar along with source > files? > On Jul 19, 2015 12:20 PM, "Stefán Baxter" > wrote: > > > Hi, > > > > I&#

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
Hi Jim, Now I have added the file to the jar (both root and resources folder) but that does not seem to change anything. Any additional ideas? Regards, -Stefan On Sun, Jul 19, 2015 at 5:40 PM, Stefán Baxter wrote: > NO! > > Thank you , will do that right now :) > > On Sun, Ju

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
that works with the drill-config counterpart would be a welcome addition to the documentation. Regards, -Stefan On Sun, Jul 19, 2015 at 5:53 PM, Stefán Baxter wrote: > Hi Jim, > > Now I have added the file to the jar (both root and resources folder) but > that does not seem to change

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
s welcomed :). Regards, -Stefan On Sun, Jul 19, 2015 at 5:59 PM, Stefán Baxter wrote: > Hi again, > > Going over the documentation once more I came across this: > >- Add the sources and classes JAR files to Drill’s classpath. > > I'm only including a standard .jar (w

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
roject we are working on to > include several examples to simplify the learning curve. If your > interested... I'd love to have you add your udf. > On Jul 19, 2015 12:59 PM, "Stefán Baxter" > wrote: > > > Hi again, > > > > Going over the documentatio

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
als with dates. > > Check the drill logs. It is likely that drill is grumpy about something > in your udf or packaging. > > Also, feel free to snitch the pom from the simple examples in order to get > the pieces assembled and packaged correctly. > > Sent from my iPhone > >

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
, Ted Dunning wrote: > Stefan, > > Have you seen this github project: > > https://github.com/mapr-demos/simple-drill-functions > > ? > > > On Sun, Jul 19, 2015 at 2:14 PM, Stefán Baxter > wrote: > > > Hi Jim, > > > > I'm still not able to m

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
ll wrote: > Hi Stefan, > > Do you think you can share your complete project ? > > This will help to debug it for you. > > T > > On Sunday, July 19, 2015, Stefán Baxter wrote: > > > Hi Ted, > > > > I fetched this, built it and deployed it without

Re: Drill not picking up a UDF

2015-07-19 Thread Stefán Baxter
Hi, The project can be found here: https://github.com/acmeguy/asdrill Thank you, -Stefán On Sun, Jul 19, 2015 at 11:57 PM, Stefán Baxter wrote: > Hi, > > I'm more than happy to share the little that is there (I will publish it > on github and send link tomorrow). > >

Re: Drill not picking up a UDF

2015-07-20 Thread Stefán Baxter
, DateTime will need to be org.joda.time.DateTime and > roundTimeStamp will need to be > com.activitystream.drill.udfs.ASUserDefinedFunctions.roundTimeStamp. > > > > On Sun, Jul 19, 2015 at 7:02 PM, Stefán Baxter > wrote: > > > Hi, > > > > The project can

Re: Drill not picking up a UDF

2015-07-20 Thread Stefán Baxter
need to be > org.joda.time.Period, DateTime will need to be org.joda.time.DateTime and > roundTimeStamp will need to be > com.activitystream.drill.udfs.ASUserDefinedFunctions.roundTimeStamp. > > > > On Sun, Jul 19, 2015 at 7:02 PM, Stefán Baxter > wrote: > > > Hi, &

Re: Drill not picking up a UDF

2015-07-20 Thread Stefán Baxter
58 PM, Jacques Nadeau wrote: > Can you enable verbose errors at the session level? It may reveal more > about what is failing. > On Jul 20, 2015 5:32 AM, "Stefán Baxter" > wrote: > > > Hi Jim, > > > > I have made those changes and I'm wondering if y

Re: Drill not picking up a UDF

2015-07-20 Thread Stefán Baxter
2:07 PM, Stefán Baxter wrote: > Hi, > > After going through the log this is clear what is happening (once the > Drill picked up the UDF a bit earlier this morning). > > I'm calling the VarCharHolder.toString() to get the text value for the > parameter and that is throwi

Re: Drill not picking up a UDF

2015-07-20 Thread Stefán Baxter
also use the StringFunctionHelpers. > > > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(in > .start, in.end, in.buffer) > > Where 'in' is > > @Param NullableVarCharHolder in > > or > > @Param VarCharHolder in > > On Mon, Jul 20, 2015 at 9:07 AM, Stef

combining the results of two queries (union) before grouping (derived grouping / re-grouping)

2015-07-20 Thread Stefán Baxter
Hi, Does Drill support grouping on "post union" result sets (derived)? I'm fetching data from two sources and currently all groups, that can be found in both sets, are twice, understandably with different counts etc., in the final output. Regards, -Stefan

Re: combining the results of two queries (union) before grouping (derived grouping / re-grouping)

2015-07-20 Thread Stefán Baxter
Hi, I used the With clause to achieve this, thanks. -Stefan On Mon, Jul 20, 2015 at 8:34 PM, Stefán Baxter wrote: > Hi, > > Does Drill support grouping on "post union" result sets (derived)? > > I'm fetching data from two sources and currently all groups, th

describe for parquet, parquet-tools or other alternatives

2015-07-21 Thread Stefán Baxter
Hi, I'm wondering what tools people are using to get metadata and structure information from Parquet files. Describe can not be used for that and parquet-tools seem to be the only viable option. Are there any other alternatives that I should know about? Regards, -Stefan

Re: describe for parquet, parquet-tools or other alternatives

2015-07-21 Thread Stefán Baxter
u're looking for that is not available using > parquet-tools? > > Parth > > On Tue, Jul 21, 2015 at 3:07 AM, Stefán Baxter > wrote: > > > Hi, > > > > I'm wondering what tools people are using to get metadata and structure > > information from Par

Re: describe for parquet, parquet-tools or other alternatives

2015-07-21 Thread Stefán Baxter
thank you :) On Tue, Jul 21, 2015 at 5:25 PM, Hao Zhu wrote: > Here step by step: > How to build and use parquet-tools to read parquet files > <http://www.openkb.info/2015/02/how-to-build-and-use-parquet-tools-to.html > > > > Thanks, > Hao > > On Tue, Jul 2

The mysterious life of mr. Trim (aka. the value displacement trick)

2015-07-21 Thread Stefán Baxter
Hi, Here is small trick I think you will like :). - With this minimal dataset as /tmp/test.json: {"dimensions":{"adults":"A"}} - Running this: select lower(p.dimensions.budgetLevel) as `field1`, lower(p.dimensions.adults) as `field2` from dfs.tmp.`/test.json` as p; - To no sur

Re: The mysterious life of mr. Trim (aka. the value displacement trick)

2015-07-21 Thread Stefán Baxter
: +-+-+ | field1 | field2 | +-+-+ | a | null| +-+-+ I just as puzzled though :) Regards, -Stefan On Tue, Jul 21, 2015 at 5:32 PM, Stefán Baxter wrote: > Hi, > > Here is small trick I think you will like :). > >- With this minimal dataset as

Re: The mysterious life of mr. Trim (aka. the value displacement trick)

2015-07-21 Thread Stefán Baxter
Hi, all kidding aside then this renders Drill+Parquet unusable for us at the moment. If there is a quick fix then please let me know. :) Regards, -Stefan On Tue, Jul 21, 2015 at 5:37 PM, Stefán Baxter wrote: > Well, > > After some more testing it appears that this has nothing t

Re: The mysterious life of mr. Trim (aka. the value displacement trick)

2015-07-21 Thread Stefán Baxter
b | a | > +---+---+ > | null | null | > +---+---+ > > << b is wrong, a is correct>> > > Stefan, not sure what is going on here. It looks like an issue with > resolution of a nonexistent field in a multilevel structure. It exists > w

Re: The mysterious life of mr. Trim (aka. the value displacement trick)

2015-07-21 Thread Stefán Baxter
nexist as a from dfs.tmp.test2 p; > +---+---+ > | b | a | > +---+---+ > | null | null | > +---+---+ > > << b is wrong, a is correct>> > > Stefan, not sure what is going on here. It looks like an issue with > resolution o

Re: The mysterious life of mr. Trim (aka. the value displacement trick)

2015-07-21 Thread Stefán Baxter
Thank you! :) On Tue, Jul 21, 2015 at 10:26 PM, Jinfeng Ni wrote: > I'm taking a look at the JIRA you filed :DRILL-3533. If I could not find a > fix soon, I'll ask someone in our team to take a further look. > > > On Tue, Jul 21, 2015 at 2:50 PM, Stefán Baxter >

Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-22 Thread Stefán Baxter
Hi, I keep coming across *quirks* in Drill that are quite time consuming to deal with and are now causing mounting concerns. This last one though is far more serious then the previous ones because it deals with loss of data. I'm working with a small(ish) dataset of around 1m records (which I'm m

Re: Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-22 Thread Stefán Baxter
- never returns this: "yes", {"other":"true","all":" false","sometimes":"yes"} should have been: - never returns this: "yes", {"other":"true","all":" false","sometime

Re: Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-22 Thread Stefán Baxter
r values of the sub structure. - Stefan On Wed, Jul 22, 2015 at 10:53 PM, Stefán Baxter wrote: > - never returns this: "yes", {"other":"true","all":" > false","sometimes":"yes"} > > should have been: > >

Re: Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-23 Thread Stefán Baxter
bit rude and I know I could focus on appreciating all the effort but that will have to wait just a bit longer) - Stefan On Wed, Jul 22, 2015 at 11:01 PM, Stefán Baxter wrote: > in addition to this. > > selecting: select some, t.others, t.others.additional from > dfs.tmp.`/test.jso

Re: Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-23 Thread Stefán Baxter
;operator X doesn't > support schema changes", so I am not sure why you are getting incorrect > results in this case. > > You should definitely fill a JIRA for this and mark it as critical. We try > to fix cases where a query returns incorrect results as soon as possible. >

Re: Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-23 Thread Stefán Baxter
Hi, The only right answer to this question must be to a) "adapt to additional information" and b) "try the hardest to accommodate changes". The current behavior must be seen as completely worthless (sorry for the strong language). Regards, -Stefan On Thu, Jul 23, 2015 at 4:16 PM, Matt wrote:

Re: Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-23 Thread Stefán Baxter
Thank you. On Thu, Jul 23, 2015 at 7:24 PM, Ted Dunning wrote: > On Thu, Jul 23, 2015 at 3:55 AM, Stefán Baxter > wrote: > > > Someone must review the underlying optimization errors to prevent this > from > > happening to others. > > > > Jinfeng and Part

Re: Inaccurate data representation when selecting from json sub structures and loss of data creating Parquet files from it

2015-07-23 Thread Stefán Baxter
urn the missing data from the generated parquet file. > > > 82400 +--+---+ > 82401 | some | others | > 82402 +--+---+ > 82403 | y

IPv6 in Drill/Parquet

2015-07-24 Thread Stefán Baxter
Hi, Has anyone here opinion/ideas on how ipv6 addresses might be stored efficiently in Parquet via Drill. The Java BigInteger class handles the 128 variant but the BigIntHolder in Drill relies on a Long. Storing it in two longs is not optimal and it would surprise me if the variable binary field

storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
Hi, I would like to share our intentions for organizing our data and how we plan to construct queries for it. There are four main reasons for sharing this: a) I would like to sanity check the approach b) I'm having a hard time writing a UDF to optimize this and need a bit of help. c) This can p

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
rt of a new directory" - This function always return 1 and should never match dir0 - For some strange reason this query sometimes returns results even though it should never do that (dir0 != 1) Regards, -Stefan On Fri, Jul 24, 2015 at 12:12 PM, Stefán Baxter wrote: > Hi, &g

Re: IPv6 in Drill/Parquet

2015-07-24 Thread Stefán Baxter
ately. > > Other than that tidbit, I cannot speak to Drill's capability to leverage > said data. > > On Fri, Jul 24, 2015 at 4:20 AM, Stefán Baxter > wrote: > > > Hi, > > > > Has anyone here opinion/ideas on how ipv6 addresses might be stored > > effi

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
; } > > > > Let me know if this works for you, I didn't work on UDFs for quite some > > time now and they may have slightly changed since then. > > > > Thanks > > > > > > On Fri, Jul 24, 2015 at 7:37 AM, Stefán Baxter < > ste...@activitystream.com&g

Re: IPv6 in Drill/Parquet

2015-07-24 Thread Stefán Baxter
ion you want to group by. Thusly, > storing it in two parts would be optimal for the use case. > > On Fri, Jul 24, 2015 at 9:44 AM, Stefán Baxter > wrote: > > > Well, that is only true if you dont have a BigInteger to hold it :) > > > > see: > > > > >

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
might be failing? > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Fri, Jul 24, 2015 at 10:45 AM, Stefán Baxter < > ste...@activitystream.com > > > > > wrote: > > > > > Hi, > > > > > > t

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
o verify > first as it will most likely have a larger impact on query performance. > > Obviously the indeterminate results are very concerning. Please share any > additional details you can about reproducing this if you figure out > anything more specific. > > On Fri, Jul 24, 20

Re: Best Performance of drill

2015-07-25 Thread Stefán Baxter
Hi, I'm pretty new around here but let me attempt to answer you. - Parquet will always be (a lot) faster than CSV, especially if your querying for only a part of the columns in the CSV - Parquet is has various compression techniques and is more "scan friendly" (optimized for scanning

Re: Continued experiments

2015-07-25 Thread Stefán Baxter
#3 is incorrectly labelled and should be called "3. since it seems unclear what type UDFs return then Drill will, under some circumstances, assume arithmetic comparison (=, <>, etc.) rather than string comparison even though the UDF returns VarChar On Sat, Jul 25, 2015 at 4:18 PM, St

Re: Best Performance of drill

2015-07-25 Thread Stefán Baxter
ces is fairly cheap if you're only talking about a few TB. > > -- David > > On Jul 25, 2015, at 7:16 AM, Hafiz Mujadid > wrote: > > > Thanks alot Stefan :) > > > > On Sat, Jul 25, 2015 at 2:58 PM, Stefán Baxter < > ste...@activitystream.com> >

Running Drill with Tachyon+S3

2015-07-26 Thread Stefán Baxter
Hi, I'm trying to run Drill with Tachyon on top of S3. My Drill-Source config looks like this: { "type": "file", "enabled": true, "connection": "tachyon://localhost:19998/", "workspaces": { "root": { "location": "/", "writable": true, "defaultInputFormat": null }

Re: Running Drill with Tachyon+S3

2015-07-26 Thread Stefán Baxter
The file $DRILL_HOME/bin/hadoop-excludes.txt has a list of jars that ARE > NOT loaded during the bootstrap of Drill … and jets3t is one of them. > Commenting out the jets3t line in that file and restarting the drill bits > will at least get you past the first java dependency problem. > > — D

Re: Running Drill with Tachyon+S3

2015-07-26 Thread Stefán Baxter
omplaining about missing s3 config. I would think it enough for the Tachyon client to talk to the Tachyon worker that is already handling all S3 communications just fine. Anyways; if anyone here has been down this hole then please share. Regards, -Stefan On Sun, Jul 26, 2015 at 3:22 PM, Ste

Type confusion and number formatting exceptions

2015-07-27 Thread Stefán Baxter
Hi, It seems that null values can trigger a column to be treated as a numeric one, in expressions evaluation, regardless of content or other indicators and that fields in substructures can affect same-named-fields in parent structure. (1.2-SNAPSHOT, parquet files) I have JSON data that can be red

Re: Type confusion and number formatting exceptions

2015-07-29 Thread Stefán Baxter
x27;plan.item.added','plan.item.removed') group by p.type, > > coalesce(cast(p.dimensions.dim_type as varchar(20)), > cast(p.dimensions.type > > as varchar(20))); > > > > > > > > > > > > On Mon, Jul 27, 2015 at 4:59 AM, Stefán Baxter < > s

The incomplete saga of Drill, Tachyon and S3 (Three Amigos, - the analytics edition)

2015-07-29 Thread Stefán Baxter
Hi, I have been trying to get Drill to work with Tachyon ( http://tachyon-project.org/index.html) using S3 as a Deep storage (Tachyon: Under File System). The whole Idea is that each Drillbit (node) has it own, mutli tired, local storage (MEM, SSD + HDD) and uses that to cache Parquet files which

Re: The incomplete saga of Drill, Tachyon and S3 (Three Amigos, - the analytics edition)

2015-07-29 Thread Stefán Baxter
Hi Calvin, This actually did the trick and we it up and running now :). On thing took me by complete surprise and that is the fact that the directory structure is not reflected into S3 and instead Tachyon puts everything into one folder and in numbered files. That make me question two design dec

Re: Count where or having clause does not work as expected

2015-07-30 Thread Stefán Baxter
Hi, That last case works as expected, sorry, this test data does have null values for country_code. That means that I have a working solution but that it would be nice if v1 (above) would work. Thank you, -Stefán On Thu, Jul 30, 2015 at 2:15 PM, Stefán Baxter wrote: > make that "

Re: Count where or having clause does not work as expected

2015-07-30 Thread Stefán Baxter
make that "Count in where or having clause does not work as expected" On Thu, Jul 30, 2015 at 2:14 PM, Stefán Baxter wrote: > Hi, > > I have data that can be reduced to this: > >- {"client_ip":{"country_code":"US"}} >

Count where or having clause does not work as expected

2015-07-30 Thread Stefán Baxter
Hi, I have data that can be reduced to this: - {"client_ip":{"country_code":"US"}} - {"client_ip":{"country_code":"US"}} - {"client_ip":{"country_code":"US"}} - {"client_ip":{"country_code":"GB"}} - {"client_ip":{"country_code":"US"}} This works fine: select p.client_ip.country_c

Re: Count where or having clause does not work as expected

2015-07-30 Thread Stefán Baxter
to apply a filter > after an aggregate is with the HAVING clause. > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Jul 30, 2015 at 8:20 AM, Andries Engelbrecht < > aengelbre...@maprtech.com> wrote: > > > Last I checked group by and having

Re: Count where or having clause does not work as expected

2015-07-30 Thread Stefán Baxter
code | event_count | > +---+--+ > | US| 4| > | null | 2| > +---+--+ > 2 rows selected (0.285 seconds) > > Looks like the query gets the correct answer. > > > [1] > > http

Amazon EFS and EC2 based Drillbits

2015-07-30 Thread Stefán Baxter
Hi, Soon the Elastic File System will be available by Amazon that promises fast access to "thousands of concurrent EC2 nodes" based on the NFSv4 protocol. I know some people here have been using S3 as storage and the way I understand it it's working so-so and hase the following ddrawbacks: -

Re: Integration of Apache_drill with mysql

2015-07-31 Thread Stefán Baxter
Hi Fabian and thank you for sharing, Do you know if this would work, without much effort, with Drill 1.1 ? Regards, -Stefán On Fri, Jul 31, 2015 at 9:54 AM, Fabian Wilckens wrote: > Hi Pankaj, > > have a look: https://github.com/mapr-emea/apache-drill-jdbc-plugin > > > On Fri, Jul 31, 2015 at

pending queries jamming the system

2015-08-03 Thread Stefán Baxter
Hi, I have a small cluster of 3 drillbits running. It's been working just fine until it stopped working altogether. I notice a few "pending" queries and when I try to cancel them, via the admin, they either report that they don't know where they are running or the cancelling process freezes. What

Re: pending queries jamming the system

2015-08-03 Thread Stefán Baxter
umps of the three Drillbits (you can get this using jstack), > and > (2) json query profiles of the PENDING queries (you can get this from the > "Full JSON Profile" at the bottom of the profile page). > > Thank you, > Sudheesh > > > On Aug 3, 2015, at 9:00 AM, Stefá

A incomplete map to a maze - or - the jurney down the rabbit hole

2015-08-03 Thread Stefán Baxter
Hi, I have been meaning to write a blog post regarding our Drill experiments but I though I might share some thoughts here first. Hopefully some newbie can benefit from this and perhaps it sheds some light on what drives newcomers in this community (or at least some part of them). A bit of a back

Re: A incomplete map to a maze - or - the jurney down the rabbit hole

2015-08-04 Thread Stefán Baxter
partition pruning based > on > > directory. If it does not work the way you want, probably there is a bug > > in the code. We would appreciate if you can provide more detail, so that > > we could re-produce the problem and get it fix asap. > > > > Regards, > &g

data type differences and 2 incompatible parquet files

2015-09-01 Thread Stefán Baxter
Hi, I'm battling minor inconsistencies in 2 Parquet file generated from the same(ish) json structure. (product of 2 separate CTAS processes but the json was compatible before conversion) I can not create query that reads from them both and this is the error I get: [Error Id: 4ee4c131-31fc-4252-a

directory pruning and UDFs

2015-09-17 Thread Stefán Baxter
Hi, I have been writing a few simple utility functions for Drill and staring at the cumbersome dirN conditions required to take advantage of directory pruning. Would it be possible to allow UDFs to throw fileOutOfScope and directoryOutOfScope exceptions that would allow me to a) write a failry cl

Re: directory pruning and UDFs

2015-09-18 Thread Stefán Baxter
lso be done if you're using Parquet files without all the dirN > syntax. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Sep 17, 2015 at 10:42 AM, Stefán Baxter > > wrote: > > > Hi, > > > > I have been writing a few simple uti

CTAS exception

2015-09-18 Thread Stefán Baxter
Hi, I have some json files that I want to transform to parquet. We have been doing this without any issues but this time around I get this exception: Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableVarCharVec

Re: CTAS exception

2015-09-18 Thread Stefán Baxter
statement. Can you share both queries you tried (the > failing CTAS and the successful SELECT *) ? > > Thanks > > On Fri, Sep 18, 2015 at 5:38 AM, Stefán Baxter > wrote: > > > Hi, > > > > I have some json files that I want to transform to parquet. > > &

Re: CTAS exception

2015-09-18 Thread Stefán Baxter
The failing select query: select * from dfs.* where occurred_at < '2015-09-18' order by occurred_at; -ste On Fri, Sep 18, 2015 at 4:02 PM, Stefán Baxter wrote: > Hi, > > Both statements select everything but the CTAS statement included a date > filter + date ord

Re: CTAS exception

2015-09-18 Thread Stefán Baxter
This fails: - select * from dfs.asa.* where occurred_at < '2015-09-18' order by occurred_at; This, oddly enough, does not fail: - select occurred_at from dfs.* where occurred_at < '2015-09-18' order by occurred_at; -ste On Fri, Sep 18, 2015 at 4:0

Re: CTAS exception

2015-09-18 Thread Stefán Baxter
ce the only > column being read is *occurred_at, *you may not be hitting the issue. First > query being a select * would read all columns and may hit this schema > change error. > > > On Fri, Sep 18, 2015 at 9:16 AM, Stefán Baxter > wrote: > > > This fails: > &

<    1   2   3   >