Re: Stop Drill querying .tmp files

2015-10-15 Thread Daniel Barclay
Instead of defining a hard-coded set of prefixes, suffixes, and/or patterns, can we give users some kind of configuration parameter somewhere? Perhaps the file-system plug-in should have a configuration parameter that is a list of "glob" or regular-expression patterns specifying names to ignore,

DATE_TRUNC units

2015-10-15 Thread Jason Jho
Is there a reason why 'week' isn't supported? If not, what's the most sensible way to group timestamped data by week?

Re: REST API Documentation?

2015-10-15 Thread Kristine Hahn
Well, there's this minimal info with a pointer to the doc below: http://drill.apache.org/docs/plugin-configuration-basics/#storage-plugin-rest-api This link was mentioned on the user list a few days ago and the Apache Drill docs now include a pointer to it: https://docs.google.com/document/d/1mRs

Re: REST API Documentation?

2015-10-15 Thread Bob Rumsby
Here is a Google doc: https://docs.google.com/document/d/1mRsuWk4Dpt6ts-jQ6ke3bB30PIwanRiCPfGxRwZEQME/edit It may be time for us to add this to the Drill docs. Bob On Thu, Oct 15, 2015 at 4:03 PM, Christopher Matta wrote: > What about general REST API reference documentation? What endpoints ar

Re: REST API Documentation?

2015-10-15 Thread Christopher Matta
What about general REST API reference documentation? What endpoints are available? On Thursday, October 15, 2015, Kristine Hahn wrote: > I haven't seen any info about REST and impersonation, but the REST API > documentation pertaining to authentication for the Web Console is on Web > Console and

Re: Assigning Different Extensions for Storage Plugins

2015-10-15 Thread Abhishek Girish
I don't think any such thing is required. I'm not sure why you still see the issue. After you updated the storage plug-in, can you confirm if the changes did take effect? Also what version of Drill are you using? On Thu, Oct 15, 2015 at 12:23 PM, John Omernik wrote: > No I added, the bin extensi

Re: REST API Documentation?

2015-10-15 Thread Kristine Hahn
I haven't seen any info about REST and impersonation, but the REST API documentation pertaining to authentication for the Web Console is on Web Console and REST API Privileges on http://doc.mapr.com/display/MapR/Configuring+Web+Console+and+REST+API+Security and not in the Apache Drill docs because

REST API Documentation?

2015-10-15 Thread Christopher Matta
I can't seem to find any reference documentation around the REST API on https://drill.apache.org/docs/, am I missing where it is? I'd like to know how you can pass a user along with the query when impersonation is enabled, also how would you authenticate when authentication is enabled? Chris Matt

Re: Issue with Mongo Drill

2015-10-15 Thread Tomer Shiran
You can download the binary release of 1.2 here: http://people.apache.org/~adeneche/apache-drill-1.2.0-rc3/ On Thu, Oct 15, 2015 at 10:37 AM, Ted Dunning wrote: > MapR has pushed an independent advanced release of Drill 1.2-ish that > likely has this fix. > > > > On Thu, Oct 15, 2015 at 7:56 AM

Re: Assigning Different Extensions for Storage Plugins

2015-10-15 Thread John Omernik
No I added, the bin extension, updated the storage plugin, then tried the query... do I need to relogin to sqlline for things to take effect? On Thu, Oct 15, 2015 at 1:50 PM, Abhishek Girish wrote: > You'll get a "file not found" error if Drill cannot recognize an extension > (**). So if you tri

Re: Assigning Different Extensions for Storage Plugins

2015-10-15 Thread Abhishek Girish
You'll get a "file not found" error if Drill cannot recognize an extension (**). So if you tried querying a file with say .bin extension before you added "bin" as an extension to the json format plugin (and did not specify the default input format), you'd hit that issue. Can you try once more, aft

Re: Assigning Different Extensions for Storage Plugins

2015-10-15 Thread John Omernik
That's on me, I thought I had typed good json, but apparently I did not. I got an invalid json format and I assumed that specifying extensions there was not valid. That said, when I tried to select a file, or a directory I am get "file not found" with the .bin extension, yet I know it to be there

Re: S3 data source - invalid URI

2015-10-15 Thread scott cote
I have had problems with hyphens in the s3 names in the past when used with drill. try removing the hyphen. SCott > On Oct 15, 2015, at 11:03 AM, Jason Jho wrote: > > I've enabled JetS3t and configured a storage plugin for S3, but I'm having > trouble connecting to an S3 bucket URI that the pa

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Ryan Blue
I'm attaching the patch that I put together for decimal. This includes: * Decimal schema translation from Avro to Parquet - Need to add date, time, timestamp - Need to add Parquet to Avro support * Read-side support for any Avro logical type * Special write-side support for decimal - This w

Re: Issue with Mongo Drill

2015-10-15 Thread Ted Dunning
MapR has pushed an independent advanced release of Drill 1.2-ish that likely has this fix. On Thu, Oct 15, 2015 at 7:56 AM, Kamesh wrote: > Hi James, > Which version of Drill are you using?. Also whether any of the documents > contain field of data type timestamp or date?. > If so, recently

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Chris Mathews
Hi Ryan Thanks for this - it sounds just what we need. How do we go about doing a trial of the local copies with our code ? It would be good to check this all out now if 1.8.0 is delayed for a while ? contact me by https://drillers.slack.com/messages/dev/team/cmathews/ to discuss. Cheers — Chri

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Julien Le Dem
thanks Ryan! (cc parquet dev list as well) On Thu, Oct 15, 2015 at 9:46 AM, Ryan Blue wrote: > Hi Chris, > > Avro does have support for dates, but it hasn't been released yet because > 1.8.0 was blocked on license issues (AVRO-1722). I have a branch with > preliminary parquet-avro support for De

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Julien Le Dem
Hi Chris, You could probably contribute some sort of type annotation to parquet-avro so that it produces the data type in the Parquet schema. This class generates a Parquet schema from the Avro schema: https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/av

How to format paths in select: MapR Audit Logs

2015-10-15 Thread John Omernik
Hey all - I am trying to demonstrate a neat use case. Using audit logs in MapR, I'd like to be able to point Drill at the directory, and just go, no loading of data, just go. The problem I am having is how to describe the path. First how logs are stored. >From the base of MapRFS /var/mapr/loca

S3 data source - invalid URI

2015-10-15 Thread Jason Jho
I've enabled JetS3t and configured a storage plugin for S3, but I'm having trouble connecting to an S3 bucket URI that the parser doesn't seem to like. I know it's a valid S3 bucket since it doesn't violate any of the naming conventions and can access the data via s3cmd. The error I am seeing is:

Re: Issue with Mongo Drill

2015-10-15 Thread Jim Bates
I hit that same issue many times but the problem resolved after I upgraded to a pre release of Drill 1.2 On Thu, Oct 15, 2015 at 9:21 AM, Jacques Nadeau wrote: > I believe that this error is due to an incompatibility between Mongo's > Extended JSON support and Drill's extended JSON support that

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Chris Mathews
Thank you Jacques - yes this is exactly the issue I am having. We are currently using Avro to define schemas for our Parquet files, and as you correctly point out there is no way of defining date types in Avro. Due to the volumes of data we are dealing with, using CTAS is not an option for us a

Re: Assigning Different Extensions for Storage Plugins

2015-10-15 Thread Abhishek Girish
That should have worked! Also, I did try it out now: *Data:* # cat abc.bin {"abc":"123", "pqr":"789"} *Format Plugin:* "json": { "type": "json", "extensions": [ "json", "bin" ] } *Query:* > select * from dfs.tmp.`abc.bin`; +--+--+ | abc | pqr |

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Stefán Baxter
Hi Jacques, You describe my small dilemma much better than me, thank you. Regards, -Stefan On Thu, Oct 15, 2015 at 3:19 PM, Jacques Nadeau wrote: > A little clarification here: > > Parquet has native support for date types. Drill does too. However, since > Avro does not, there is no way that

Re: Issue with Mongo Drill

2015-10-15 Thread Jacques Nadeau
I believe that this error is due to an incompatibility between Mongo's Extended JSON support and Drill's extended JSON support that is fixed in 1.2 (release imminent e.g. next 24-48 hours). If you want to try out the fix immediately, you would need to grab the master branch of Drill and build it yo

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Jacques Nadeau
A little clarification here: Parquet has native support for date types. Drill does too. However, since Avro does not, there is no way that I know of to write a Parquet file via the Avro adapter that will not require a cast. If you did a CTAS in Drill and cast the data types correctly in the CTAS,

Re: Issue with Mongo Drill

2015-10-15 Thread Kamesh
Hi James, Which version of Drill are you using?. Also whether any of the documents contain field of data type timestamp or date?. If so, recently we have fixed issues in handling timestamp/date and binary date type. These fixes are targeted for *1.2.0* release, which will happen very soon. On T

Issue with Mongo Drill

2015-10-15 Thread Mangold, James
Hi, I am trying to query a collection in mongo directly, using this query: select * from mongo.omegatestbed.testshard4 where ENTITY_ID = 1216515 limit 1; ENTITY_ID is indexed, ascending. The columns in the collection may have differing data types for the same column name. I get this error: E

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Stefán Baxter
Hi Chris, I understand now, thank you. What threw me off was that, in our standard use-case, we are not using cast for our TIMESTAMP_MILLIS fields and I thought we were getting them directly formatted from Parquet but then I overlooked our UDF that is handling the casting... sorry :). Thank you

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Chris Mathews
Hi Stefan I am not sure I fully understand your question 'why you don't seem to be storing your dates in Parquet Date files.' As far as I am aware all date types in Parquet (ie: DATE, TIME_MILLIS, TIMESTAMP_MILLIS) are all stored as either in int32 or int64 annotated types. The only other opti

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Stefán Baxter
Thank you Chris, this clarifies a whole lot :). I wanted to try to avoid the cast in the CTAS on the way from Avro to Parquet (not possible) and then avoid casting as much as possible when selecting from the Parquet files. What is still unclear to me is why you don't seem to be storing your dates

Assigning Different Extensions for Storage Plugins

2015-10-15 Thread John Omernik
Hey all, I have some json files that are written out in with a .bin extension. (Process not under my control). In drill I am able to create a workspace that uses a default input type of json, and this is able to read with no issues, but I'd like to be able specify that .bin should also be read as

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Chris Mathews
Hello Stefan We use Avro to define our schemas for Parquet files, and we find that using long for dates and converting the dates to long using milliseconds works. We then CAST the long to a TIMESTAMP on the way out during the SELECT statement (or by using a VIEW). example java snippet: // v