Benchmark numbers using Drill

2017-10-18 Thread PROJJWAL SAHA
Hi, Is there any public performance benchmark that users have achieved using Drill in production scenarios ? It would be useful if someone can pass me any links for customer user stories. Regards

Re: S3 Connection Issues

2017-10-18 Thread Charles Givre
Hi Padma, The bucket is is us-west-2. I also discovered that some of the variable names in the documentation on the main Drill site are incorrect. Do I need to specify the region in the configuration somewhere? As an update, after discovering that the variable names are incorrect and that

Re: S3 Connection Issues

2017-10-18 Thread Padma Penumarthy
Which AWS region are you trying to connect to ? We have a problem connecting to regions which support only v4 signature since the version of hadoop we include in Drill is old. Last time I tried, using Hadoop 2.8.1 worked for me. Thanks Padma > On Oct 18, 2017, at 8:14 PM, Charles Givre

S3 with mixed files

2017-10-18 Thread Daniel McQuillen
Hi, Attempting to use Apache Drill to parse Open edX tracking log files I have stored on S3. I've successfully set up an S3 connection and I can see my different directories in the target S3 bucket when I type `show files;` in embedded drill. Hooray! However, I can't seem to do a query. I keep

Re: describe query support? (catalog metadata, etc)

2017-10-18 Thread Charles Givre
I’d like to second Alfredo’s request. I’ve been trying to get Drill to work with some open source visualization tools such as SqlPad and Metabase and the issue I keep running into is that Drill doesn’t have a convenient way to describe how it interprets flat files. This is really frustrating

Re: describe query support? (catalog metadata, etc)

2017-10-18 Thread Chun Chang
There were discussions on the need of building a catalog for drill. But I don't think that's the focus right now. And I am not sure the community will ever decide to go in that direction. For now, you best bet is to create views on top of your JSON/CSV data.

Re: Date Conversion Question

2017-10-18 Thread Charles Givre
Hi Julian, Alas, this doesn’t work in Drill since Drill uses Joda time formats. However, you got me thinking about this and I actually got it to work w/o using the substring or other weird string manipulation functions. SELECT to_timestamp ('2017-08-10T09:12:26.000Z',

describe query support? (catalog metadata, etc)

2017-10-18 Thread Alfredo Serafini
Hi I'm experimenting using Drill as a data virtualization component via JDBC and it generally works great for my needs. However some of the components connected via JDBC needs basic metadata/catalog informations, and they seems to be missing for JSON / CSV sources. For example the simple query

Re: Date Conversion Question

2017-10-18 Thread Bob Rudis
FWIW I was doing very similar substring (etc) machinations until we started converting output from back-end data-generation tools directly into parquet (using other tools). IMO it's a common enough format (at least in the types of data you and I likely have to work with :-) that it'd be great if

Date Conversion Question

2017-10-18 Thread Charles Givre
Hello Drillers, I have a silly question which I’m a little stuck with. I have some data in CSV format with dates in the following format: 2017-08-10T09:12:26.000Z. I’m trying to convert this into a date time data field so that I have both the date and the hours, however I keep running into

Re: Issue in executing query on Drill Cluster

2017-10-18 Thread Khurram Faraaz
Can you please see if you can access that file as the default user in both cases, because you mention that the default user is different in both cases. Try to do a hadoop fs -ls on that file, from both the different users and verify. Also, can you please share the JDBC connection string that

RE: Issue in executing query on Drill Cluster

2017-10-18 Thread Chetan Kothari
HI Khurram I have given permission 777 to file /datalake/replicator/testdemo2. I am connecting to Drill with default user. But default user is different in both cases. Is it creating issue? How do we fix it? Regards Chetan -Original Message- From: Khurram Faraaz

Re: Issue in executing query on Drill Cluster

2017-10-18 Thread Khurram Faraaz
Hi Chetan, 1. What are the permissions to the file /datalake/replicator/testdemo2 ? 2. Are you connecting as the same user to Drill in both cases, (i) in embedded mode, and (ii) in the 4 node Drillbit cluster ? Thanks, Khurram From: Chetan Kothari

Re: Queries getting CANCELED

2017-10-18 Thread Rahul Raj
I think i found the issue - I was not reading the result set back. Just reading the number of results written fixes the problem. try(Connection connection = ctx.getConnection()){ try(Statement st = connection.createStatement()){ st.executeQuery("alter session set `store.format` ='csv'")